CILVR SEMINAR: Show Your Work with Confidence: Confidence Bands for Tuning Curves | Representations of Neural Network Training Dynamics

Speaker: Nick Lourie, Michael Hu

Location: 60 Fifth Avenue, Room 7th floor common area

Date: Thursday, April 11, 2024

Talk 1
Title: Show Your Work with Confidence: Confidence Bands for Tuning Curves
Abstract: The choice of hyperparameters greatly impacts performance in deep learning. Often, it is hard to tell if a method is better than another or just better tuned. Tuning curves fix this ambiguity by accounting for tuning effort. Specifically, they plot validation performance as a function of the number of hyperparameter choices tried so far. While several estimators exist for these curves, it is common to use point estimates which fail silently when given too little data. Beyond point estimates, confidence bands are necessary to rigorously establish the relationship between different approaches. In this talk, we present the first confidence bands for tuning curves. The bands are exact, simultaneous, and distribution-free, thus they provide a robust basis for comparing methods. We validate their design with ablations, analyze the effect of sample size, and provide guidance on comparing models with our method. To learn more, see the paper: “Show Your Work with Confidence: Confidence Bands for Tuning Curves” or try out opda: our easy-to-use library you can install with pip (https://github.com/nicholaslourie/opda).
Bio: Nick Lourie takes seriously the project of building intelligent machines. By building them, he hopes to better understand how to make decisions and learn about the world. He’s held roles across the machine learning lifecycle from basic research to software engineering. Previously, he investigated machine ethics, common sense, prompting, and the evaluation of natural language processing models at the Allen Institute for AI. Later, he applied deep learning to financial markets at Two Sigma Investments. Currently, he is pursuing a PhD at NYU advised by He He and Kyunghyun Cho, where he seeks to develop better statistical frameworks for designing, developing, and evaluating neural networks.


Talk 2
Title: Representations of Neural Network Training Dynamics
Abstract: Neural network training is neither homogeneous nor deterministic. Different training runs can exhibit different training dynamics, convergence times, and generalization properties, despite having the same optimization hyperparameters. In this work, we consider whether the training trajectory of a neural network can tell us anything about final outcomes of training. To this end, we fit a hidden Markov model (HMM) over sequences of metrics collected throughout training. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we produce a low-dimensional, discrete representation of training dynamics on grokking tasks, image classification, and masked language modeling. We use the HMM representation to study phase transitions and identify latent “detour” states that slow down convergence. Our work suggests that the training trajectory is indeed useful for predicting final outcomes of training; we discuss ways to exploit this hypothesis in future work.
Bio: Michael Hu is a PhD student at the NYU Center for Data Science, advised by Kyunghyun Cho and Tal Linzen. He is interested in data-centric machine learning, training dynamics, and better algorithms for language model pretraining. Before NYU, he worked on reinforcement learning with Karthik Narasimhan and Tom Griffiths at Princeton. He is fortunate to be supported by an NSF Graduate Research Fellowship.