Special Seminar

Theory and Practice of Efficient Learning at Scale (Pretraining and Finetuning)

Time and Location:

Nov. 12, 2024 at 2PM; Online,

Speaker:

Soufiane Hayou, Simons Institute (UC Berkeley)

Abstract:

State-of-the-art performance is achieved via a series of engineered modifications to existing neural architectures and their training procedures. A common feature of these networks is their large-scale nature: modern neural networks consist of Billions - if not 100's of Billions - of trainable parameters. Moreover, empirical evaluations (generally) support the claim that increasing the scale of neural networks (e.g. width and depth) boosts model performance if done correctly. However, given a neural network model, it is not straightforward to address the crucial question `how do we adjust the training hyperparameters (initialization, learning rate, etc) as we scale the network?'. In this talk, I will show how we can leverage different mathematical results to efficiently scale and train neural networks with applications both in pretraining and fine-tuning.