Algorithmic view on neural information processing

Speaker: Hadi Daneshmand

Location: 60 Fifth Avenue, Room 650

Date: Wednesday, November 1, 2023

Deep neural networks are powerful tools for processing data, but their data processing mechanism remains enigmatic. Recent research has shed light on their inner workings: Their compositional structure enables them to implement iterative optimization methods. In this talk, I will explore this new perspective that links neural networks to optimization methods.
 

To begin, I will review experimental studies. I will delve into the "iterative inference" hypothesis, suggesting that neural networks use a form of gradient descent to process data, even if they don't explicitly compute gradients. I will present observations for such inference in convolutional nets and large language models.


Next, I will discuss the theoretical studies on the "iterative inference" hypothesis. These studies prove large language models are expressive to implement first-order optimization algorithms for certain function classes. While fascinating, expressive results have limitations in explaining the outcome of training. I will show how to overcome the limitations via the landscape analysis of training loss. Such analysis can characterize the adaptation of information processing to data distribution for "in-context learning" of linear functions.

This talk is based on a joint work with Kwangjun Ahn, Xiang Cheng, and Suvrit Sra titled “Transformers learn to implement preconditioned gradient descent for in-context learning” and will be presented in NeurIPs 23.