Skills
Experience
Visual Inference Lab, TU Darmstadt
https://www.visinf.tu-darmstadt.deResearcher at Visual Inference Lab, working on robust semantic analysis of heterogeneous scenes from multi-modal data streams
IT-Consultant
http://www.patrip.orgfor PATRIP Foundation: Planning, implementation and maintaining of a data exchange platform
SysAdmin
at several companies: Administration of Linux/Unix servers, setup and management of high availability servers and loadbalancers
Publications
Adapters Strike Back
Jan-Martin O. Steitz, and Stefan Roth,
in
Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2024.
Adapters provide an efficient and lightweight mechanism for adapting trained transformer models to a variety of different tasks. However they have often been found to be outperformed by other …
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, and Iryna Gurevych,
in
Findings of the Association for Computational Linguistics (ACL),
2022.
Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling …
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, and Stefan Roth,
in
Proc. of the 43rd DAGM German Conference on Pattern Recognition (GCPR),
2021, Best Paper Honorable Mention.
Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, …