Events
Global AI Frontier Lab, Seminar Series : Video as a Knowledge Source: Video-based Retrieval-Augmented Generation in Diverse Scenarios
Speaker: Kangsan Kim
Location: 1 MetroTech Center, Room 22nd Floor
Date: Monday, September 8, 2025
Dinner & networking will begin at 6:00 PM and the seminar will start at 7:00 PM EST. The seminar will be held at the Global AI Frontier Lab at 1 Metrotech Center, Brooklyn, NY 11201. This event will be in-person & online. In-person attendance is strongly encouraged for Lab researchers in NYC. Please RSVP by filling out this Google Form. For online attendees, a Zoom link will be sent out prior to the event. Please reach out to global-ai-frontier-lab@nyu.edu with any questions.
Abstract: Retrieval-Augmented Generation (RAG) is a powerful strategy for improving the accuracy of models by retrieving external knowledge relevant to queries and incorporating it into the generation process. However, existing approaches primarily focus on text and images, and they largely overlook videos, a rich source of multimodal knowledge capable of representing contextual details. In this talk, we explore how videos can be leveraged in the context to provide helpful information across diverse scenarios. We first aim to enhance LLMs’ ability to understand out-of-distribution videos using in-context learning with relevant example videos. To address the limitation of context length, we introduce confidence-based iterative inference, which outperforms existing baselines by leveraging a larger set of video examples. We then expand this video-augmented response generation approach to general user queries. For a given text query, our framework retrieves relevant videos and leverages both their visual and textual content through adaptive frame selection. Lastly, we further investigate the future potential of video-based RAG in real-world settings, with a focus on question answering over days-long egocentric videos collected by multiple embodied agents.
Bio: Kangsan Kim is a second-year PhD student at KAIST AI, advised by Sung Ju Hwang. He holds a B.S. in Computer Science at KAIST and is currently visiting NYU under Mengye Ren. His research focuses on video understanding, multimodal RAG, and multi-agent system. His personal homepage: https://kangsankim07.github.io