Special Seminar
Theory and Practice of Efficient Learning at Scale (Pretraining and Finetuning)
Time and Location:
Nov. 12, 2024 at 2PM; Online,Speaker:
Soufiane Hayou, Simons Institute (UC Berkeley)Abstract:
State-of-the-art performance is achieved via a series of engineered modifications to existing neural architectures and their training procedures. A common feature of these networks is their large-scale nature: modern neural networks consist of Billions - if not 100's of Billions - of trainable parameters. Moreover, empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts model performance if done correctly. However, given a neural network model, it is not straightforward to address the crucial question `how do we adjust the training hyperparameters (initialization, learning rate, etc) as we scale the network?'. In this talk, I will show how we can leverage different mathematical results to efficiently scale and train neural networks with applications both in pretraining and fine-tuning.