Principled Approaches for Trustworthy Algorithms, Statistics, and Machine Learning

Speaker: Gautam Kamath

Location: 60 Fifth Avenue, Room 150

Date: Monday, March 25, 2024

Despite impressive recent advances, machine learning models exhibit a number of critical deficiencies. They are prone to leaking sensitive information about their training data. They remain alarmingly brittle to attacks by malicious parties. Troublingly, these issues stem from more fundamental statistical vulnerabilities, which remain unresolved even decades later, highlighting significant gaps in our understanding of how to deal with these important considerations. As long as these problems remain, our models will not be appropriate for use beyond deployment in toy settings.

In this talk, I will discuss recent advances on a number of these problems, which give key new algorithmic insights into how to address these considerations, and enable real-world deployments that were previously thought infeasible. In a first vignette, we will explore how to guarantee individual privacy in machine learning models, with a particular focus on large language models and the important role played by public data in the training pipeline. In a second vignette, we focus on how to robustly perform mean estimation, giving the first efficient and accurate algorithms for multivariate settings. We will go on to discuss connections to robustness against data poisoning attacks, robust exploratory data analysis, and surprising conceptual and technical connections with privacy.