NLP/Text-as-Data Speaker Series: “Compositionality, or Systematic Generalization, in Transformers”

Speaker: Chris Manning

Location: 60 Fifth Avenue, Room 7th Floor Open Space

Date: Thursday, October 20, 2022

Are the neural transformer models now widely used in NLP compositional? Popular theories of compositionality from linguistics stipulate that compositional meaning representation systems must compute meaning in a way that mirrors the tree-like structure of natural language. There is an apparent tension between these compositional accounts of human language understanding, which are based on a restricted bottom-up computational process, and the enormous success of neural models like transformers, which can route information arbitrarily between different parts of their input. One possibility is that these models, while extremely flexible in principle, in practice learn to interpret language hierarchically, ultimately building sentence representations close to those predictable by a bottom-up, tree-structured model. To evaluate this possibility, we describe an unsupervised and parameter-free method to functionally project the behavior of any transformer into the space of tree-structured networks. Given an input sentence, we produce a binary tree that approximates the transformer’s representation-building process and a score that captures how tree-like the transformer’s behavior is on the input. While calculation of this score does not require training any additional models, it upper-bounds the fit between a transformer and any tree-structured approximation. Using this method, we show that transformers for three different tasks become more tree-like over the course of training, in some cases unsupervisedly recovering the same trees as supervised parsers. These trees, in turn, are predictive of model behavior, with more tree-like models generalizing better on tests of compositional generalization.