Events
NLP and Text-as-Data Speaker Series
Speaker: Michael Hahn (Saarland University)
Location: 60 Fifth Avenue, Room (7th floor common area)
Date: Thursday, December 5, 2024
Recent progress in LLMs has rapidly outpaced our ability to understand their inner workings. This talk describes our recent work addressing this challenge. First, we develop rigorous mathematical theory describing the abilities (and limitations) of transformers in performing computations foundational to reasoning. We also examine differences and similarities with state-space models such as Mamba. Second, we propose a theoretical framework for understanding success and failure of length generalization in transformers. Third, we propose a method for reading out information from activations inside neural networks, and apply it to mechanistically interpret transformers performing various tasks. I will close with directions for future research.