MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Speaker: Jim Fan

Location: TBA
Videoconference link: https://mit.zoom.us/j/91205686838?pwd=ODBaanFyMlY4TVQxZ0UwYmswTyt1Zz09

Date: Wednesday, August 10, 2022

Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports an infinite variety of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with 1000s of diverse open-ended tasks and an internet-scale knowledge base with 730K YouTube videos, 7K Wiki pages, and 340K Reddit posts. Using MineDojo’s data, we propose a novel agent learning algorithm that leverages large pre-trained video-language models as a learned reward function. Our agent is able to solve Minecraft tasks specified in free-form language without any manually designed dense shaping reward. MineDojo is open-sourced at https://minedojo.org. We look forward to seeing how MineDojo empowers the community to make progress on the grand challenge of open-ended agent learning.

This talk is jointly sponsored by NYU and MIT. 

Speaker bio: Jim (Linxi) Fan is a research scientist at NVIDIA AI. He obtained his CS PhD degree from Stanford, advised by Prof. Fei-Fei Li. His PhD dissertation was titled “Training and Deploying Visual Agents at Scale”. Previously, he did AI research internships at NVIDIA, OpenAI, Google AI, and Baidu Silicon Valley Labs. His current research interests are foundation models for embodied agents, reinforcement learning, robotics, and large-scale distributed training.