Mathematics Colloquium

Allowing Image And Text Data To Communicate

Time and Location:

May 05, 2025 at 3:45PM; Warren Weaver Hall, Room 1302

Speaker:

Andrew Stuart, California Institute of Technology

Abstract:

A fundamental problem in artificial intelligence is the question of how to simultaneously deploy data from different sources such as audio, image, text and video; such data is known as multimodal. In this talk I will focus on the canonical problem of aligning image and text data, and describe some of the mathematical ideas underlying the challenge of allowing them to communicate. I will describe the encoding of text and image in Euclidean spaces and describe contrastive learning methods to identify and learn embeddings which align these two modalities; I will also describe the attention mechanism, a form of nonlinear  correlation in vector-valued sequences, which is central to this endeavour. Attention turns out to be useful beyond this specific context, and I will show how it may be used to design and learn maps between Banach spaces or between spaces of probability measures. Problems arising in data assimilation will be used, throughout the talk, to illustrate the theory and methodology.