An Invitation to Imitation

Speaker: J. Andrew (Drew) Bagnell,

Location: 370 Jay Street, Room 1201
Videoconference link: https://nyu.zoom.us/meeting/register/tJcscOGqrzMqGtWzsv4t1Nt-iys7nH4ackq4

Date: Monday, April 15, 2024

Many (most?) AI problems are better framed as imitation rather than supervised learning. Whether in learning for self-driving vehicles or in Large Language Models, it is increasingly important to understand and mitigate compounding errors that occur when a learner’s outputs influence that learner’s own inputs.

We identify and analyze this core problem of imitation learning: distribution shift under small errors by the learner. Simple notions of non-realizability help capture inherent statistical errors during learning— whether those are in reality due to imperfect optimization, less information than a human demonstrator has, imperfect and limited data, or model complexity limitations.

We focus on two particular approaches to mitigate error compounding. The first class, in the style of Data-set Aggregation (DAGGeR), requires an interactive expert that can provide corrections to a learner. The second, Inverse Reinforcement Learning, requires less– only interactive interaction with an environment. Despite this, new results show the latter is more powerful than the former. Those improvements typically come at the expense of turning the imitation learning problem into a sequence of harder reinforcement learning problems; however, we show new algorithms that are (provably) sample efficient as well as empirically effective. We discuss recent algorithms and extensions for LLMs that can be understood in the DAGGER and IRL frameworks.