Events
Bias in Language Model Representations: How to find it, why it matters, and ways it behaves unexpectedly
Speaker: Seraphina Goldfarb-Terrant
Location: 60 Fifth Avenue, Room 204
Date: Monday, October 3, 2022
Overcoming bias in language models is an exponentially growing area of research, but much of existing debiasing methods are applied to narrow tasks and settings ad hoc. It is not clear why they worked or whether they would work if anything changed. Debiasing needs more analysis and interpretability work.
In the first half of this talk, I’ll look at the effects of debiasing on model representations. This will develop a more robust understanding of what is actually happening in various debiasing conditions, and what this means for improving debiasing efforts.
In the second half, I’ll share ongoing work on how bias appears differently in multilingual representations vs. in monolingual ones, and discuss some of the surprising instability of bias effects to supposedly irrelevant model features.