Events
MIC Seminar: Understanding the Mechanisms of Fast Hyperparameter Transfer
Speaker: Nikhil Ghosh (Flatiron Institute)
Location: 60 Fifth Avenue, Room 150
Date: Tuesday, May 12, 2026
The growing scale of deep learning models has rendered standard hyperparameter (HP) optimization prohibitively expensive. A promising solution is the use of scale-aware hyperparameters, which can enable direct transfer of optimal HPs from small-scale grid searches to large models with minimal performance loss. To understand the principles governing such transfer strategies, we develop a conceptual framework for reasoning about HP transfer across scale. In synthetic settings, we present quantitative examples where transfer either offers a provable computational advantage or fails even under muP. To explain the fast transfer observed in practice, we conjecture that decomposing the optimization trajectory reveals two contributions to loss reduction: (1) a width-stable component that determines the optimal HPs and (2) a width-sensitive component that improves with width but weakly perturbs the HP optimum. We present empirical evidence for this hypothesis in large language model pretraining.
Bio: Nikhil Ghosh is a Research Fellow at the Center for Computational Mathematics, Flatiron Institute. His research focuses on developing a foundational understanding of deep learning, particularly questions related to model scaling and optimization. He obtained his Ph.D. in Statistics from UC Berkeley under the supervision of Bin Yu and Song Mei, and completed his B.S. in Computer Science at Caltech under the supervision of Yisong Yue.