CILVR seminar: Simple and effective discrete diffusion language models

Speaker: Prof. Volodymyr Kuleshov, Cornell Tech

Location: 60 Fifth Avenue, Room 7th floor open space

Date: Wednesday, January 29, 2025

While diffusion generative models excel at high-quality image generation, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods on discrete data such as text or biological sequences. Our work takes steps towards closing this gap via a simple and effective framework for discrete diffusion. This framework is simple to understand—it optimizes a mixture of denoising (e.g., masking) losses—and can be seen as endowing BERT-like models with principled samplers and variational estimators of log-likelihood. Crucially, our algorithms are not constrained to generate data sequentially, and therefore have the potential to improve long-term planning, controllable generation, and sampling speed. In the context of language modeling, our framework yields masked diffusion language models (MDLMs), which achieve a new state-of-the-art among diffusion models, and approach AR quality. Combined with novel extensions of classifier-free and classifier-based guidance mechanisms, these algorithms are also significantly more controllable than AR models. Discrete diffusion extends beyond language to science, where it forms the basis of a new generation of DNA foundation models that set a new state of the art in genome annotation. Discrete diffusion holds the promise to advance generative modeling and its applications in language understanding and scientific discovery.