Two Theoretical Analyses of Modern Deep Learning: Graph Neural Networks and Language Model Finetuning

Speaker: Noam Razin

Location: 60 Fifth Avenue, Room 204

Date: Thursday, December 7, 2023

The resurgence of deep learning was largely driven by architectures conceived in the 20th century, trained using labeled data. In recent years, deep learning has undergone paradigm shifts characterized by new architectures and training regimes. Despite the popularity of the new paradigms, their theoretical understanding is limited. In this talk, I will present two recent works focusing on theoretical aspects of modern deep learning. The first work (to appear at NeurIPS 2023) considers the expressive power of graph neural networks, and formally quantifies their ability to model interactions between vertices. As a practical application of the theory, I will introduce a simple edge sparsification algorithm that achieves state-of-the-art results. The second work (under review) identifies a fundamental vanishing gradients problem that occurs when using reinforcement learning to finetune language models. I will demonstrate the detrimental effects of this phenomenon and present possible solutions. Lastly, I will conclude with an outlook on important questions raised by the advent of foundation models and possible tools for addressing them.
Works covered in the talk were in collaboration with Nadav Cohen, Tom Verbin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, and Etai Littwin.