Events
Algorithmic view on neural information processing
Speaker: Hadi Daneshmand
Location: 60 Fifth Avenue, Room 650
Date: Wednesday, November 1, 2023
Deep neural networks are powerful tools for processing data, but their data processing mechanism remains enigmatic. Recent research has shed light on their inner workings: Their compositional structure enables them to implement iterative optimization methods. In this talk, I will explore this new perspective that links neural networks to optimization methods.
To begin, I will review experimental studies. I will delve into the "iterative inference" hypothesis, suggesting that neural networks use a form of gradient descent to process data, even if they don't explicitly compute gradients. I will present observations for such inference in convolutional nets and large language models.
Next, I will discuss the theoretical studies on the "iterative inference" hypothesis. These studies prove large language models are expressive to implement first-order optimization algorithms for certain function classes. While fascinating, expressive results have limitations in explaining the outcome of training. I will show how to overcome the limitations via the landscape analysis of training loss. Such analysis can characterize the adaptation of information processing to data distribution for "in-context learning" of linear functions.
This talk is based on a joint work with Kwangjun Ahn, Xiang Cheng, and Suvrit Sra titled “Transformers learn to implement preconditioned gradient descent for in-context learning” and will be presented in NeurIPs 23.