Scaling up
semi-supervised learning to gigantic image collections
Rob Fergus (CIMS)
Abstract:
With the advent of the Internet it is now possible to collect hundreds
of millions of images. These images come with varying degree of label
information- "clean labels" can be manually obtained on a small
fraction, "noisy labels" may be extracted automatically from
surrounding
text, while for most images there are no labels at all. Semi-supervised
learning is a prinicipled framework for combining these different label
sources. But semi-supervised learning scales polynomially with
thenumber of images, which makes it impractical for hundreds of
millions of images with thousands of labels.
In this paper we show how to utilize recent results in machine learning
to obtain highly efficient approximations to semi-supervised learning.
Specifically, we use the convergence of the eigenvectors of the
normalized graph Laplacian to eigenfunctions of weighted
Laplace-Beltrami operators. Our algorithm enables us to apply
semi-supervised learning to clean up a huge databases of millions of
images.
Joint work with: Yair Weiss (Hebrew U.) and Antonio Torralba (MIT)