Events
MaD Seminar: Data Integration: Challenges and Opportunities for Interpolation Learning under Distribution Shifts
Speaker: Pragya Sur
Location: 60 Fifth Avenue, Room 7th floor open space
Date: Thursday, October 16, 2025
Min-norm interpolators arise naturally as implicitly regularized limits of modern neural networks and other widely used algorithms. Recently, their out-of-distribution risk was studied when test samples are unavailable during training. However, in many applications, a limited amount of test data is typically accessible during training. The properties of min-norm interpolation in this setting remain poorly understood. In this talk, I will present a characterization of the risk associated with pooled min-L2-norm interpolation under both covariate and concept shifts. I will show that the pooled interpolator encompasses both early and an intermediate form of fusion. Our results yield several important insights. For instance, in the presence of concept shift, incorporating additional data can actually harm prediction performance when the signal-to-noise ratio is low. Conversely, for higher signal-to-noise ratios, transfer learning is beneficial—provided the shift-to-signal ratio remains below a precise threshold, which I will define. Furthermore, under covariate shift, we find that heterogeneity between domains can improve prediction accuracy when the model is sufficiently overparameterized. To reach these conclusions, we develop new advances in random matrix theory that are of broad utility in the study of heterogeneous data problems under overparametrization. Time permitting, I will discuss applications of our results to the challenge of combining real data with synthetic data generated by AI models. This is based on joint works with Anvit Garg, Kenny Gu, Yanke Song, and Sohom Bhattacharya.
Bio:
Pragya Sur is an Assistant Professor of Statistics at Harvard University, currently on leave as a Visiting Professor at MIT’s Laboratory for Information and Decision Systems. She works on high-dimensional and overparametrized problems arising in statistics, machine learning, and data science. She is a recipient of the NSF Career Award, the Eric and Wendy Schmidt Fund for Strategic Innovation, the William F. Milton Fund Award, and a Dean’s Competitive Fund for Promising Scholarship. Among other honors, in 2021, she was invited to speak at the National Academies’ of Sciences, Engineering, and Medicine symposium on Mathematical Challenges for Machine Learning and Artificial Intelligence (AI). From 2022-2024, she was invited to lead the Institute of Mathematical Statistics New Researchers Group. Currently, she serves as an Associate Editor for Statistical Science and as an invited Guest Co-Editor for their special issue on statistics and AI. She is also an incoming Associate Editor for Journal of the Royal Statistical Society Series B.