Events
CDS Colloquium: Discrete Diffusion Language Models
Speaker: Volodymyr Kuleshov (Cornell Tech)
Location: 60 Fifth Avenue, Room 150
Date: Friday, February 6, 2026
While diffusion generative models excel at high-quality image generation, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods on discrete data such as text or biological sequences. Our work takes steps towards closing this gap via a simple and effective framework for discrete diffusion. This framework is simple to understand—it optimizes a mixture of denoising (e.g., masking) losses—and can be seen as endowing BERT-like models with principled samplers and variational estimators of log-likelihood. Crucially, our algorithms are not constrained to generate data sequentially, and therefore have the potential to improve long-term planning, controllable generation, and sampling speed.
In the context of language modeling, our framework enables deriving masked diffusion language models (MDLMs), which achieve a new state-of-the-art among diffusion models, and approach AR quality. Combined with novel extensions of classifier-free and classifier-based guidance mechanisms, these algorithms are also significantly more controllable than AR models. Discrete diffusion extends beyond language to science, where it forms the basis of a new generation of DNA foundation models. Our largest models focus on plants and set a new state of the art in genome annotation, while also enabling effective generation. Discrete diffusion models hold the promise to advance progress in generative modeling and its applications in language understanding and scientific discovery.
Bio: Volodymyr Kuleshov is the Joan Eliasoph, M.D. Assistant Professor at the Jacobs Technion-Cornell Institute at Cornell Tech and in the Computer Science Department at Cornell University. He obtained his Ph.D. in Computer Science from Stanford University, where he was the recipient of the Arthur Samuel Best Thesis Award.
Kuleshov’s research interests are in the field of generative modeling and its applications in scientific discovery and health. His work has been featured in Nature Biotechnology, Nature Medicine, Nature Communications, and has been recognized with an NSF CAREER award, NIH MIRA award, as well as multiple industry awards. Kuleshov is also a co-founder of Inception AI, a startup developing the world's first diffusion language models.