Multi-Modal Learning: What, How and When?

Speaker: Divyam Madaan

Location: 1 MetroTech Center, Room Global AI Frontier Lab (22nd Floor)

Date: Monday, July 28, 2025

Abstract: Building multi-modal models that can learn from diverse data remains a significant challenge despite extensive research into new benchmarks and architectures. This talk argues that we might be stuck because we are working with incomplete assumptions. We begin by asking what is multi-modal learning? Through the lens of generative models, we define it as the combination of two dependencies: inter-modality (between modalities and target task label) and intra-modality (between a single modality and target task label). We validate this definition with a systematic approach across 20 commonly used benchmarks for large-scale multi-modal models and multiple architectures. Next, we answer how to model these dependencies? We introduce the I2M2 framework, jointly capturing both dependency types. Finally, we tackle the practical question of when a multi-modal approach is necessary? Given the challenges of acquiring paired data, we will present our ongoing work to quantify the impact of missing modalities on downstream performance.
 

Bio: Divyam Madaan (website, twitter/X) is a fourth-year Ph.D. student at New York University, advised by Sumit Chopra and Kyunghyun Cho. His research focuses on developing models that can effectively learn from multiple modalities and generalize across distribution shifts, with a special emphasis on healthcare applications. Prior to NYU, he earned his M.S. in Computer Science from KAIST, where he worked on model robustness against adversarial examples and continual adaptation to evolving data and architectures. His work has been published at leading venues including ICML, NeurIPS, CVPR and ICLR, where he has also been recognized with oral and spotlight presentation awards.

Registration Required  Please RSVP by filling out this Google Form Dinner & networking will begin at 6:00 PM and the seminar will start at 7:00 PM EST.