Two recent lower bounds for interactive decision making

Speaker: Yanjun Han

Location: 60 Fifth Avenue, Room 150

Date: Thursday, September 21, 2023

A fundamental problem in interactive decision making, ranging from bandit problems to reinforcement learning, is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. While both questions are well understood for classical problems of statistical estimation and learning, there are relatively fewer tools to analyze the fundamental limits for the interactive counterparts.

In this talk I will present two general lower bound techniques for interactive decision making. First, we introduce a complexity measure, called the Constrained Decision-Estimation Coefficient, which is an interactive counterpart of the Donoho-Liu type modulus of continuity in statistical estimation. This complexity measure provides a lower bound of the optimal regret for general interactive problems, as well as a matching upper bound up to an online estimation error. Second, we attempt to close this gap via a generalization of the Fano type arguments, using a suitable notion of information for interactive problems. In a special class of problems called ridge bandits, our new tool leads to lower bounds on the entire learning trajectory via differential equations. We also provide upper bounds that evolve with similar differential equations, and thereby showcase the complication of finding a unified complexity measure in general.

Based on recent work https://arxiv.org/abs/2301.08215 and https://arxiv.org/abs/2302.06025, jointly with Dylan Foster, Noah Golowich, Jiantao Jiao, Nived Rajaraman, and Kannan Ramchandran.