Reliable Measurement for ML at Scale

Speaker: A. Feder Cooper

Location: 370 Jay Street, Room 825
Videoconference link: https://nyu.zoom.us/j/96027635964

Date: Thursday, March 14, 2024

To develop rigorous knowledge about ML models — and the systems in which
they are embedded — we need reliable measurements. But reliable measurement
is fundamentally challenging, and touches on issues of reproducibility,
scalability, uncertainty quantification, epistemology, and more. In this
talk, I will discuss the criteria needed to take reliability seriously:
both criteria for designing meaningful metrics, and for methodologies that
ensure that we can dependably and efficiently measure these metrics at
scale and in practice. I will give two examples of my research that put
these criteria into practice: (1) large-scale evaluation of training-data
memorization in large language models, and (2) evaluating latent
arbitrariness in algorithmic fairness binary classification contexts.
Throughout this discussion, I will emphasize how public governance requires
making metrics understandable to diverse stakeholders. For this reason, my
work aims to design metrics that are legally cognizable — a goal that
frames both my ML and legal scholarship. I will highlight several important
connections that I have uncovered between ML and law, including the
relationships between (1) the generative-AI supply chain and US copyright
law, and (2) ML arbitrariness and arbitrariness in legal rules.


This talk reflects joint work with collaborators at The GenLaw Center,
Cornell CS, Cornell Law School, Google DeepMind, and Microsoft Research.