Bias in Language Model Representations: How to find it, why it matters, and ways it behaves unexpectedly

Speaker: Seraphina Goldfarb-Terrant

Location: 60 Fifth Avenue, Room 204

Date: Monday, October 3, 2022

Overcoming bias in language models is an exponentially growing area of research, but much of existing debiasing methods are applied to narrow tasks and settings ad hoc. It is not clear why they worked or whether they would work if anything changed. Debiasing needs more analysis and interpretability work.


In the first half of this talk, I’ll look at the effects of debiasing on model representations. This will develop a more robust understanding of what is actually happening in various debiasing conditions, and what this means for improving debiasing efforts.
In the second half, I’ll share ongoing work on how bias appears differently in multilingual representations vs. in monolingual ones, and discuss some of the surprising instability of bias effects to supposedly irrelevant model features.