Events
TaD Seminar: Toward Scalable and Actionable Interpretability
Speaker: Yonatan Belinkov
Location:
60 Fifth Avenue, Room 7th floor open space
Videoconference link:
https://nyu.zoom.us/j/96270502871
Date: Thursday, October 23, 2025
Interpretability research has made many interesting discoveries about how language models and other deep learning models operate. However, despite this progress, interpretability has remained behind recent advances in how language models are used in practice. In this talk, I will describe some of our recent interpretability work, starting with scientific insights about the kind of algorithms that may be employed by language models. I will then describe several case studies where interpretability insights informed solutions for known problems: overcoming the modality gap in vision-language models and removing undesired information from trained models. I will end by suggesting directions for making interpretability research more scalable and actionable.
Bio: Yonatan Belinkov is an Assistant Professor at the Technion. He is a former Azrieli Faculty Fellow and was a Mind Brain and Behavior Postdoctoral Fellow at Harvard University. Prior to that, he received his PhD from MIT. He is spending the current academic year at the Kempner Institute at Harvard University thinking about issues of interpretability, controllability, multi-agent communication, and AI for Science.