Learning hierarchical representations to compose new data

Speaker: Matthieu Wyart

Location: 60 Fifth Avenue, Room 7th floor open space

Date: Thursday, May 30, 2024

Abstract: Learning generic tasks in high dimension is impossible. Yet, deep networks  classify images, large  models  learn the structure of language and produce meaningful texts, and diffusion based models  generate new images of  high  quality. In all these cases, building a hierarchical representation of the data is believed to be  key to  success. How is it achieved? How many data are needed for that, and how does it depend on the data structure? Once such a representation is obtained, how can it be used to compose a new whole data from known low-level features? I will introduce generative models of hierarchical data for which an understanding of these questions is emerging. I will discuss  recent results  on (i) supervised learning, (ii) next token prediction and (iii) score-based generative models. In the last two cases, our framework makes novel predictions that we test both on text and image data sets.