CS Colloquium: Towards Scalable Representation Learning for Visual Recognition

Speaker: Saining Xie

Location: 60 Fifth Avenue, Room 150
Videoconference link: https://nyu.zoom.us/j/91395863843

Date: Tuesday, March 29, 2022

A powerful biological and cognitive representation is essential for
humans' remarkable visual recognition abilities. Deep learning has
achieved unprecedented success in a variety of domains over the last
decade. One major driving force is representation learning, which is
concerned with learning efficient, accurate, and robust representations
from raw data that are useful for a downstream classifier or predictor.
A modern deep learning system is composed of two core and often
intertwined components: 1) neural network architectures and 2)
representation learning algorithms. In this talk, we will present
several studies in both directions. On the neural network modeling side,
we will examine modern network design principles and how they affect the
scaling behavior of ConvNets and recent Vision Transformers.
Additionally, we will demonstrate how we can acquire a better
understanding of neural network connectivity patterns through the lens
of random graphs. In terms of representation learning algorithms, we
will discuss our recent efforts to move beyond the traditional
supervised learning paradigm and demonstrate how self-supervised visual
representation learning, which does not require human annotated labels,
can outperform its supervised learning counterpart across a variety of
visual recognition tasks. The talk will encompass a variety of vision
application domains and modalities (e.g. 2D images, 3D scenes and
languages). The goal is to show existing connections between the
techniques specialized for different input modalities and provide some
insights about diverse challenges that each modality presents. Finally,
we will discuss several pressing challenges and opportunities that the
``big model era’’ raises for computer vision research.