CILVR Seminar: Dependence Induced Representation Learning

Speaker: Xiangxiang Xu

Location: 60 Fifth Avenue, Room 7th floor open space
Videoconference link: https://nyu.zoom.us/s/91086220972

Date: Wednesday, November 13, 2024

Despite the vast progress in deep learning practice, theoretical understandings of learned feature representations remain limited. In this talk, we discuss three fundamental questions from a unified statistical perspective:
(1) What representations carry useful information?
(2) How are representations learned from distinct algorithms related?
(3) Can we separate representation learning from solving specific tasks?
We formalize representations that extract statistical dependence from data, termed dependence-induced representations. We prove that representations are dependence-induced if and only if they can be learned from specific features defined by Hirschfeld–Gebelein–Rényi (HGR) maximal correlation. This separation theorem signifies the key role of HGR features in representation learning and enables a modular design of learning algorithms. Specifically, we demonstrate the optimality of HGR features in simultaneously achieving different design objectives, including minimal sufficiency (Tishby's information bottleneck), information maximization, enforcing uncorrelated features (VICReg), and encoding information at various granularities (Matryoshka representation learning). We further illustrate that by adapting HGR features, we can obtain representations learned by distinct practices, from cross-entropy or hinge loss minimization, non-negative feature learning, and neural density ratio estimators to their regularized variants. We also discuss the applications of our analyses in interpreting learning phenomena such as neural collapse, understanding existing self-supervised learning practices, and obtaining more flexible designs, e.g., inference-time hyperparameter tuning.