publications

publications by categories in reversed chronological order.

2025

  1. Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries
    Roberto Amoroso*, Gengyuan Zhang*, Rajat Koner, and 4 more authors
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025, 2025

2024

  1. Localizing Events in Videos with Multimodal Queries
    Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, and 5 more authors
    arXiv preprint arXiv:2406.10079, 2024
  2. Multimodal Pragmatic Jailbreak on Text-to-image Models
    Tong Liu, Zhixin Lai, Gengyuan Zhang, and 4 more authors
    In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
  3. VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
    Ruotong Liao, Max Erler, Huiyu Wang, and 4 more authors
    In , 2024
  4. FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
    Haokun Chen, Hang Li, Yao Zhang, and 6 more authors
    arXiv preprint arXiv:2410.04810, 2024

2023

  1. Multi-event Video-Text Retrieval
    Gengyuan Zhang, Jisen Ren, Jindong Gu, and 1 more author
    In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
  2. SPOT! Revisiting Video-Language Models for Event Understanding
    Gengyuan Zhang, Jinhe Bi, Jindong Gu, and 1 more author
    arXiv preprint arXiv:2311.12919, 2023
  3. Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
    Gengyuan Zhang, Yurui Zhang, Kerui Zhang, and 1 more author
    arXiv preprint arXiv:2307.06166, 2023
  4. A systematic survey of prompt engineering on vision-language foundation models
    Jindong Gu, Zhen Han, Shuo Chen, and 7 more authors
    arXiv preprint arXiv:2307.12980, 2023

2022

  1. CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
    Yao Zhang, Haokun Chen, Ahmed Frikha, and 5 more authors
    arXiv preprint arXiv:2211.10567, 2022

2021

  1. Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework
    Zhen Han*, Gengyuan Zhang*, Yunpu Ma, and 1 more author
    In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Nov 2021