Events
Holistic measurement and mitigation of social bias in generative LMs
Speaker: Eric Smith
Location: 60 Fifth Avenue, Room 204
Date: Friday, March 3, 2023
Generative language models have grown increasingly large and skilled in the past few years, as evidenced by ChatGPT and its massive popularity. As these models scale up, we must scale up how we measure and improve upon their safety and fairness, so that they do not produce unwanted hateful outputs or give unequal treatment to people based on demographics in a way that harms marginalized communities. This talk presents two recent papers in which we have strived to expand the capabilities of bias measurements by (1) measuring and mitigating differences in dialogue as a function of the assumed race and gender of the speaker given their name; and (2) measuring bias in prompt continuations and sentence likelihoods as a function of nearly 600 identity terms across 13 different demographic axes.