Events
Quantifying and Improving Generalization through Compression Bounds and Model Merging
Speaker: Sanae Lotfi
Location:
60 Fifth Avenue, Room 7th floor open space
Videoconference link:
https://nyu.zoom.us/s/96159462724
Date: Wednesday, October 2, 2024
Gaining insight into the mechanisms behind the generalization of deep learning models is crucial for building on their strengths, addressing their limitations, and deploying them in safety-critical applications. As state-of-the-art models for various data modalities become increasingly large and are trained on internet-scale data, the notion of generalization becomes more challenging to define. Compression bounds provide a principled tool that characterizes a model’s ability to generalize by combining two components: its performance on the training data, and its compressed size. In this talk, I will present our work on deriving and computing the first non-vacuous generalization bounds for pretrained large language models (LLMs), showing that these models can discover regularities that generalize to unseen data. Our findings reveal that larger LLMs not only have better generalization bounds but are also more compressible than smaller models. Additionally, I will explore how loss surface analysis and model merging offer new insights into improving generalization. Specifically, we show that end-to-end gradient-based learning of routing strategies between different models yield the best model merging results in the multi-task learning setting.