CILVR SEMINAR: A Framework for Multi-modal learning: Jointly Modeling Inter- and Intra-Modality Dependencies | Variance-Covariance Regularization Improves Representation Learning

Speaker: Divyam Madaan, Jiachen Zhu

Location: 60 Fifth Avenue, Room 7th floor common area
Videoconference link: https://nyu.zoom.us/s/94530532064

Date: Thursday, March 14, 2024

Talk 1
Title: A Framework for Multi-modal learning: Jointly Modeling Inter- and Intra-Modality Dependencies

Abstract: Supervised multi-modal learning is a key paradigm in machine learning that involves mapping multiple input modalities to a target label. However, its effectiveness can greatly vary across different applications. In this talk, I will delve into the factors behind these performance fluctuations in multi-modal learning and introduce a framework designed to mitigate these disparities. I will demonstrate how traditional methods, typically concentrating on either the inter-modality dependencies (the relationships between different input modalities and the label) or the intra-modality dependencies (the relationships within a single input modality and the label), may not reliably achieve optimal predictive results. To tackle this challenge, I will introduce our Inter+Intra-Modality (I2M) modeling, which combines inter- and intra-modality dependencies to enhance prediction accuracy. Our findings, drawn from real-world applications in healthcare and vision-and-language tasks, indicate that our approach outperforms traditional methods that focus solely on one type of modality dependency.

Bio: Divyam Madaan is a third year PhD candidate at NYU, advised by Sumit Chopra and Kyunghyun Cho. His current research focuses on developing models that can learn from various modalities and adapt to changes in the data distribution. Before joining NYU, he received his Masters from KAIST, where he explored the robustness of machine learning models to adversarial examples and their ability to continually adapt to changes in data and architectures.

 

Talk 2
Title: Variance-Covariance Regularization Improves Representation Learning

Abstract: Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. This work adapt a self-supervised learning regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg). This adaptation encourages the network to learn high-variance, low-covariance representations, promoting learning more diverse features. Through extensive empirical evaluation, we demonstrate that our method significantly enhances transfer learning for images and videos and improves performance in scenarios like long-tail learning and hierarchical classification. Additionally, we show its effectiveness may stem from its success in addressing challenges like gradient starvation and neural collapse.

Bio: Jiachen Zhu is currently in his fourth year as a Computer Science PhD candidate at NYU Courant, advised by Prof. Yann LeCun. His research is centered on developing advanced self-supervised learning methods for images and videos, aiming to leverage a vast collection of unlabeled data to achieve superior representation learning for various downstream tasks. Additional, he also works on applying insights from self-supervised learning to enhance supervised learning and innovate in neural network architecture, with the goal of advancing both supervised and self-supervised learning domains.