CDS Seminar: How Text Models Acquire Syntax

Speaker: Naomi Saphra

Location: 60 Fifth Avenue, Room 7th Floor Open Space

Date: Wednesday, April 13, 2022

When we ask why a neural network is so effective at solving some task, some researchers mean, "How does training impose bias towards effective representations?" This approach can lead to inspecting loss landscapes, analyzing convergence, or identifying phase transitions in training. Other researchers mean, "How does this model represent linguistic structure?" This approach can lead to model probing, testing on challenge sets, or inspecting attention distributions. The work in this talk instead considers the question, "How does training impose bias towards linguistic structure?"

This question is of interest to NLP researchers as well as general machine learning researchers. Language has well-studied compositional behavior, offering a realistic but intuitive domain for studying the gradual development of structure over the course of training. Meanwhile, studying why existing models and training procedures are effective by relating them to language may suggest future improvements. I will discuss how models that target different linguistic properties diverge over the course of training, exhibiting behavior connected to current practices and theoretical proposals in training dynamics. I will propose a method for analyzing hierarchical behavior in LSTMs, and apply it in synthetic experiments to illustrate how LSTMs implicitly learn like classical parsers. Finally, I will present ongoing research on how mode connectivity and loss surface topography can influence the generalization capabilities of fine tuned Transformer models.