Events
Achieving AI Safety in a Contested World
Speaker: Yevgeniy Vorobeychik, Washington University
Location: 370 Jay Street, Room 825
Date: Thursday, February 6, 2025
As the increasing capabilities of AI-enabled systems have led to broad deployment across diverse applications ranging from conversational agents to self-driving cars, safety considerations have come to be central to the current research agenda. However, the very meaning of safety has come to be broad and in some cases contested. For example, there may be responses to conversational prompts that some may deem neutral, while others offensive, or autonomous driving behaviors that some may view as efficient while others perceive them as dangerously aggressive. A useful way to conceptualize safety considerations is to divide these into two categories: objective and subjective. The former (for example, running over a pedestrian) is not reasonable contested, while the latter (for example, how aggressively a self-driving car should merge onto a freeway) can admit a range of legitimate perspectives. In this talk, I will present our recent work tackling both objective and subjective safety considerations. On the former, I will present learning-based approaches for synthesizing provably stable and safe neural network controllers in known dynamical systems, combining gradient-based methods for both synthesis and verification with ideas from curriculum learning. Further, I will briefly discuss our recent work that facilitates safety specifications that combine natural language with formal logic, in which we combine LLMs with conformal prediction to obtain provably correct plans. For the latter, I will discuss an axiomatic framework for preference learning that accounts for disagreement in safety preferences, as well as a novel approach for reinforcement learning with diverse task (e.g., safety) specifications that achieves provable performance guarantees and state-of-the-art performance in zero-shot and few-shot settings.