CILVR Seminar: On Bringing Robots Home | NetHack is Hard to Hack

Speaker: Nur Muhammad “Mahi” Shafiullah, Ulyana Piterbarg

Location: 60 Fifth Avenue, Room 7th floor common area

Date: Thursday, December 7, 2023

Title: On Bringing Robots Home
Abstract:  Throughout history, we have successfully integrated various machines into our homes. Dishwashers, laundry machines, stand mixers, and robot vacuums are just a few recent examples. However, these machines excel at performing only a single task effectively. The concept of a “generalist machine” in homes - a domestic assistant that can adapt and learn from our needs, all while remaining cost-effective - has long been a goal in robotics that has been steadily pursued for decades. In this work, we initiate a large-scale effort towards this goal by introducing Dobb·E, an affordable yet versatile general-purpose system for learning robotic manipulation within household settings. Dobb·E can learn a new task with only five minutes of a user showing it how to do it, thanks to a demonstration collection tool (“The Stick”) we built out of cheap parts and iPhones. We use the Stick to collect 13 hours of data in 22 homes of New York City, and train Home Pretrained Representations (HPR). Then, in a novel home environment, with five minutes of demonstrations and fifteen minutes of adapting the HPR model, we show that Dobb·E can reliably solve the task on the Stretch, a mobile robot readily available on the market. Across roughly 30 days of experimentation in homes of New York City and surrounding areas, we test our system in 10 homes, with a total of 109 tasks in different environments, and finally achieve a success rate of 81%. Beyond success percentages, our experiments reveal a plethora of unique challenges absent or ignored in lab robotics. These range from effects of strong shadows to variable demonstration quality by non-expert users. With the hope of accelerating research on home robots, and eventually seeing robot butlers in every home, we open-source Dobb·E software stack and models, our data, and our hardware designs on our website, https://dobb-e.com

Brief bio: Nur Muhammad “Mahi” Shafiullah is a fourth year Ph.D. student at NYU CS department advised primarily by Prof. Lerrel Pinto. Mahi primarily focuses on Robot Learning, which is the intersection of robotics and machine learning that allows robot to learn both from humans and on their own. Currently, he  is working on how to best get robots to cohabitate and collaborate with humans in a household settings. Previously, Mahi was a visiting scientist at Fundamental AI Research (FAIR) at Meta with Ishan Misra, Before that, he was at MIT, working on Robust Machine Learning with Prof. Aleksander Madry, where he got his Masters and Undergraduate degrees. Mahi is grateful to be supported by the 2023 Apple Scholars in AI/ML PhD fellowship.


Title: NetHack is Hard to Hack

Abstract: Neural policy learning methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents outperformed neural approaches by over four times in median game score. We delve into the reasons behind this performance gap and present an extensive study on neural policy learning for NetHack. To conduct this study, we analyze the winning symbolic agent, extending its codebase to track internal strategy selection in order to generate one of the largest available demonstration datasets. Utilizing this dataset, we examine (i) the advantages of an action hierarchy; (ii) enhancements in neural architecture; and (iii) the integration of reinforcement learning with imitation learning. Our investigations produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score. However, we also demonstrate that mere scaling is insufficient to bridge the performance gap with the best symbolic models or even the top human players.

Bio: Ulyana is a third year Ph.D. student in CS at NYU, advised by Prof. Rob Fergus and Prof. Lerrel Pinto. Ulyana is interested in learning settings where the promises of scaling up neural approaches fall flat, like open-ended and long-context environments. She currently studies large vision-language models and large language models for long-context decision-making. Previously, she worked on neural methods for accelerating climate simulators at the Climate Modeling Alliance and Google Applied Science. Before that, she received a B.S. in math at MIT, where she worked with Prof. Josh Tenenbaum and Dr. Kelsey Allen on physical problem solving in humans and machines. Ulyana is grateful to be supported by a NSF Graduate Research Fellowship.