MaD Seminar: Emergence and Scaling Laws in SGD Learning of Shallow Neural Networks

Speaker: Eshaan Nichani

Location: 60 Fifth Avenue, Room 150

Date: Thursday, September 11, 2025

In this talk, we study the sample and time complexity of online stochastic gradient descent (SGD) for learning a two-layer neural network with P orthogonal neurons on isotropic Gaussian data. We focus on the challenging “extensive-width” regime P>>1 and permit diverging condition number in the second-layer, covering as a special case the power-law scaling a_p= p^{-\beta}. We provide a precise analysis of SGD dynamics for the training of a student two-layer network to minimize the mean squared error (MSE) objective, and explicitly identify sharp transition times to recover each signal direction. In the power-law setting, we characterize scaling law exponents for the MSE loss with respect to the number of training samples and SGD steps, as well as the number of parameters in the student neural network. Our analysis entails that while the learning of individual teacher neurons exhibits abrupt transitions, the juxtaposition of 𝑃 ≫ 1 emergent learning curves at different timescales leads to a smooth scaling law in the cumulative objective. This is joint work with Yunwei Ren, Denny Wu, and Jason D. Lee.

Bio: Eshaan Nichani is a final-year PhD student at Princeton University, advised by Jason D. Lee and Yuxin Chen. His research focuses on the theory of deep learning, from understanding the statistical and computational limits of shallow networks trained with SGD, to proving learning guarantees for transformers in order to model how LLM capabilities such as in-context learning arise during training. He is a recipient of the IBM PhD Fellowship and the NDSEG Fellowship.