Events
CS Colloquium: How to Close the 100,000 Year “Data Gap” in Robotics
Speaker: Ken Goldberg, UC Berkeley and Ambi Robotics
Location: 60 Fifth Avenue, Room 150
Date: Friday, September 19, 2025
Large models based on internet-scale data can now pass the Turing Test for intelligence. In this sense, data has "solved" language and many analogously claim that data has solved speech recognition and computer vision. Will data also solve robotics and automation, allowing general-purpose humanoid robots to achieve human-level performance? Using commonly accepted metrics for converting word and image tokens into time, the amount of internet-scale data used to train contemporary large vision language models (VLMs) is on the order of 100,000 years. I’ll review 3 ways researchers are pursuing to close this gap, and a 4th approach, where data is collected as real robots operate in real commercial environments -- which requires bootstrapping with AI and "good old-fashioned engineering" to create robots with real return on investment that will be adopted by industry. Such robots can create a "data flywheel" to increase performance and enable new functionality, accelerating the timeline to achieve reliable, general-purpose robots