CILVR Seminars: Can Video Generative Models Learn and Generalize Physics-based Control Signals?

Speaker: Chen Sun (Brown)

Location: 60 Fifth Avenue, Room 7th Floor Open Space
Videoconference link: https://nyu.zoom.us/j/92427590339

Date: Wednesday, April 29, 2026

Abstract: Recent advances in video generation models have sparked interest in using video to simulate physical causes and effects. While "world modeling" in navigation has been well-explored, physically meaningful interactions that mimic real-world forces remain largely understudied. In this talk, we investigate using physical forces as a conditioning signal for video generation, which enable users and agents to interact with images and videos through both localized point forces, such as poking a plant, and global force fields, such as wind blowing on fabric. We demonstrate that these "force prompts" can enable videos to respond realistically to physical control signals by leveraging the visual and motion priors in the original pretrained model, without using any 3D asset or physics simulator at inference time. The primary challenge of force prompting is the difficulty in obtaining high-quality paired force-video training data, both in real-world data, due to the difficulty of obtaining force signals, and in synthetic data, due to limitations in the visual quality and domain diversity of physics simulators. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning using only synthetic videos. Building on this, we explore the role of such force-conditioned world models in robot planning and decision-making.

 

Bio: Chen Sun is an assistant professor of computer science at Brown University and a staff research scientist (part-time) at Google DeepMind, studying computer vision and machine learning. His current research focuses on video-centric world models and their application in robotics. Chen has received an NSF CAREER Award, a Richard B. Salomon Faculty Research Award, and a Samsung Global Research Outreach Award. His work on behavior prediction in videos was a CVPR 2019 best paper finalist. Previously, Chen received his PhD from the University of Southern California in 2016 and bachelor's degree from Tsinghua University in 2011.