Events
Colloquium: Making Sense of the Multimodal World
Speaker: Carl Vondrick, Columbia University
Location: 60 Fifth Avenue, Room 150
Date: Wednesday, December 4, 2024
People experience the world through modalities of sight, sound, words, touch, and more. By leveraging their natural relationships and developing multimodal learning methods, my research creates artificial perception systems with diverse skills, including spatial, physical, logical, and cognitive abilities, for flexibly analyzing visual data. This multimodal approach provides versatile representations for tasks like 3D reconstruction, visual question answering, and object recognition, while offering inherent explainability and excellent zero-shot generalization across tasks. By closely integrating diverse modalities, we can overcome key challenges in machine learning and enable new capabilities for computer vision, especially for the many upcoming applications where physical interaction are required.