CILVR Seminar: Understanding LLM Training

Speaker: Angelica Chen

Location: 60 Fifth Avenue, Room 7th floor open space
Videoconference link: https://nyu.zoom.us/j/94564115869

Date: Wednesday, October 23, 2024

Many machine learning methods focus on metrics acquired at the end of the training — however, interpreting only these metrics can be misleading. In this talk, we focus on two examples of how analyzing training dynamics can yield deeper insights about LLM behavior than interpreting the endpoints alone. In the first, we demonstrate how a common interpretability artifact may appear to be uncorrelated with model performance at the end of training, but in fact exhibits a causal relationship with key learning strategies at the beginning of training. In the second, we study an example where the theoretical properties of the optimal policy differ dramatically from those of the fully trained model. We then show how the model’s learning dynamics on different partitions of the training dataset offers an explanation that reconciles this difference. In both cases, solely interpreting the endpoint of training (either theoretical or empirical) may misrepresent what the model actually learns during training.