CS Colloquium: Addressing regulatory challenges for AI in healthcare: Building a safe and effective machine learning life cycle

Speaker: Adarsh Subbaswamy

Location: On-Line
Videoconference link: https://nyu.zoom.us/j/93656139506

Date: Monday, February 28, 2022

As machine learning (ML) is beginning to power technologies in high
impact domains such as healthcare, the need for safe and reliable
machine learning has been recognized at a national level. For example,
the U.S. Food and Drug Administration (FDA) has recently had to rethink
its regulatory framework for the ever growing number of machine
learning-powered medical devices. The core challenge for these agencies
is to determine whether machine learning models will be safe and
effective for their intended use. In my research, I seek to develop safe
and effective machine learning that can meet the needs of various
stakeholders—including users, model developers, and regulators.
Accomplishing this requires addressing technical challenges at all
stages of a machine learning system’s life cycle, from new learning
algorithms that allow users to specify desirable behavior, to
stress-tests and verification of safety properties, to model monitoring
and maintenance strategies. In this talk, I will overview my work
addressing various parts of the machine learning life cycle with respect
to the problem of dataset shift—differences between the model's training
and deployment environments that can lead to failure to generalize.
First, I will describe causally-inspired learning algorithms which allow
model developers to specify potentially problematic dataset shifts ahead
of time and then learn models which are guaranteed to be stable to these
shifts. Then I will describe a new evaluation method for stress-testing
a model's stability to dataset shift. This is generally a difficult task
because it requires evaluating the model on a large number of
independent datasets. Since the cost of collecting such datasets is
often prohibitive, I will describe a distributionally robust framework
for evaluating model robustness to user-specified shifts using only the
available evaluation data.