Events
Special MaD Seminar: Theoretical Foundations of Pretrained Models
Speaker: Qi Lei
Location:
TBA
Videoconference link:
https://nyu.zoom.us/j/99508761160
Date: Thursday, January 20, 2022
A pre-trained model refers to any model trained on broad data at scale and can be adapted (e.g., fine-tuned) to a wide range of downstream tasks. The rise of pre-trained models (e.g., BERT, GPT-3, CLIP, Codex, MAE) transforms applications in various domains and aligns with how humans learn. Humans and animals first establish their concepts or impressions from different data domains and data modalities. The learned concepts then help them learn specific tasks with minimal external instructions. Accordingly, we argue that a pre-trained model follows a similar procedure through the lens of deep representation learning.
1) Learn a data representation that filters out irrelevant information from the training tasks; 2) Transfer the data representation to downstream tasks with few labeled samples and simple models.
This talk establishes some theoretical understanding for pre-trained models under different settings, ranging from supervised pretraining, meta-learning, and self-supervised learning to domain adaptation or domain generalization. I will discuss the sufficient (and sometimes necessary) conditions for pre-trained models to work based on the statistical relation between training and downstream tasks. The theoretical analyses partly answer how they work, when they fail, guide technical decisions for future work, and inspire new methods in pre-trained models.