Holistic measurement and mitigation of social bias in generative LMs

Speaker: Eric Smith

Location: 60 Fifth Avenue, Room 204

Date: Friday, March 3, 2023

Generative language models have grown increasingly large and skilled in the past few years, as evidenced by ChatGPT and its massive popularity. As these models scale up, we must scale up how we measure and improve upon their safety and fairness, so that they do not produce unwanted hateful outputs or give unequal treatment to people based on demographics in a way that harms marginalized communities. This talk presents two recent papers in which we have strived to expand the capabilities of bias measurements by (1) measuring and mitigating differences in dialogue as a function of the assumed race and gender of the speaker given their name; and (2) measuring bias in prompt continuations and sentence likelihoods as a function of nearly 600 identity terms across 13 different demographic axes.