CILVR Seminar: Taming Large Pre-Trained Neural Language Models: Differentiable Game-Theoretic Regularization and Sensitivity-Guided Optimization

Speaker: Tuo Zhao

Location: TBA
Videoconference link: https://nyu.zoom.us/j/97986595706

Date: Tuesday, March 29, 2022

 Pre-trained language models have fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned in downstream tasks. However, as the pre-trained language models are becoming increasingly large, we have also witnessed that the gain in their generalization performance is becoming marginal, especially when we only have limited labelled data in downstream tasks.To improve their generalization, we propose a new framework for fine-tuning of pretrained models to yield better generalization performance. Our proposed approach adopts two important ingredients: (1) Differentiable game-theoretic regularization, which effectively controls the complexity of the massive model; (2) Sensitivity-guided optimization, which can reduce the parameter redundancy through adaptive learning rates. Our experiments show that the proposed approach significantly outperforms existing methods in multiple NLP tasks. Moreover, our theoretical analysis provides some new insights of game-theoretic regularization and enormous sizes of neural network models for improving generalization.