CILVR Seminar: Making 3D More Expressive for Robots

Speaker: Chung Min Kim

Location: 60 Fifth Avenue, Room 7th floor open space
Videoconference link: https://nyu.zoom.us/s/99996155620

Date: Wednesday, March 19, 2025

Robots need to interpret and interact with the same world we live in—a world where objects are not just independent entities but exist in relation to, or even as part of, one another. As humans, we naturally perceive scenes holistically and recognize that a single point in space can take on different meanings depending on context. Yet, when we provide 3D reconstructions of these environments to robots, semantic information is often structured at a single level or within predefined levels, limiting adaptability. In this talk, I will introduce a method that allows scenes to be understood at multiple levels by adjusting semantic groupings based on their physical scale. I’ll first show how we build these representations using semantic language grounding (LERF) and multi-scale object grouping (GARField). Then, I’ll demonstrate how these representations can be used for robotic manipulation—from task-oriented grasping to articulated object interaction—as well as in 4D motion reconstruction.