Gengyuan Zhang

portrait3.jpeg

pronoun: he/him

Hi, I am Gengyuan(张耕源). I am currently pursuing my PhD degree at Ludwig-Maximilian University (aka LMU Munich/University of Munich), supervised by Prof. Volker Tresp.

My research interests include Video Understanding and Multimodal Reasoning as an intersection of Computer Vision and Natural Language Processing.

Prior to this, I attained my bachelor degree (2018) in Zhejiang University, China and my master degree (2021) in Technical University of Munich, Germany.

Originally, I am from Hunan, China.

I am open to any collaboration and full-time job opportunities.







news

Apr 7, 2025 I start my internship @Amazon London!
Mar 5, 2025 One paper accepted by ICLR 2025 Workshop World Model
Feb 26, 2025 Two papers accepted at CVPR2025! See you in Nashville.
Feb 20, 2025 Our new paper is now on arXiv Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs!
Oct 28, 2024 One new paper is accepted by WACV 2025, Tuscon, Arizona!


selected publications

  1. Localizing Events in Videos with Multimodal Queries
    Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, and 5 more authors
    arXiv preprint arXiv:2406.10079, 2024
  2. Multi-event Video-Text Retrieval
    Gengyuan Zhang, Jisen Ren, Jindong Gu, and 1 more author
    In Proceedings of the IEEE/CVF International Conference on Computer Vision 2023, 2023
  3. Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
    Gengyuan Zhang, Mingcong Ding, Tong Liu, and 2 more authors
    2025
  4. Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
    Roberto Amoroso*, Gengyuan Zhang*, Rajat Koner, and 4 more authors
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025, 2025
  5. Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework
    Zhen Han*, Gengyuan Zhang*, Yunpu Ma, and 1 more author
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021
  6. Can vision-language models be a good guesser? exploring vlms for times and location reasoning
    Gengyuan Zhang, Yurui Zhang, Kerui Zhang, and 1 more author
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Nov 2024