Principles of Large-Scale Foundation Models

Speaker: Yifei Wang, MIT

Location: 60 Fifth Avenue, Room 7th floor open space

Date: Wednesday, February 5, 2025

Large-scale foundation models (e.g., GPT) have reached unprecedented levels of performance by relying on two emerging learning mechanisms: massive self-supervised learning (SSL) and flexible test-time learning (TTL). Yet, these paradigms remain poorly understood, leading to considerable trial and error in practice. In this talk, I demonstrate how identifying the underlying principles of SSL and TTL can inform more effective and reliable model design. First, I introduce a unifying graph-theoretic framework that characterizes the generalization of both discriminative and generative SSL models. This framework explains how seemingly disparate approaches converge on meaningful semantic representations without labels, and it points to practical strategies for improving model efficiency, robustness, and interpretability. Second, I investigate how test-time learning works in language models—particularly the long, reflective reasoning process as in o1 and r1—and provide both theoretical insights and scalable designs to enhance model capabilities and safety. By bridging rigorous theoretical understanding with practical algorithmic solutions, this research offers a cohesive path for building more principled, interpretable, and trustworthy foundation models