andrewheadshot

Andrew Gordon Wilson

Biography

Andrew Gordon Wilson is a Professor in the Courant Institute of Mathematical Sciences and Center for Data Science at New York University. He received his PhD in machine learning from the University of Cambridge, and BSc in mathematics and physics from the University of British Columbia. Prior to joining NYU, he was a professor at Cornell, and a postdoc at CMU. His work is focused on developing a prescriptive foundation for building intelligent systems. This work involves a mix of methods, empiricism, theory, and applications, often concerning deep neural networks, Gaussian processes, large language models, Bayesian methods, uncertainty representation, and scientific applications. He has been EXPO Chair, Tutorial Chair, Workshop Chair, and Senior Area Chair at the main machine learning conferences. He has also received several awards, including the NSF CAREER Award, the Amazon Research Award, and best paper, reviewer, area chair, and dissertation awards. Outside of work, Andrew is a classical pianist.

Career research highlights include:
  • The discovery that modes in the neural network loss landscape are connected along simple curves, known as mode connectivity.
  • The popular SWA optimization procedure, which also inspired SAM, model soups, and model merging.
  • Scalable Gaussian processes and the popular GPyTorch library.
  • Informative non-vacuous generalization bounds for large neural nets and LLMs.
  • The development of Bayesian deep learning as a research area.
  • Resolutions of several generalization phenomena, including benign overfitting and double descent, from a probabilistic perspective.
  • Identifying and understanding fundamental issues in Bayesian model selection.
  • A general method for solving for the complete basis for equivariant linear maps for a given representation and symmetry group (rotation, translation, etc.).
  • Discovering that neural networks in fact learn about salient features, even when they appear to be dominated by spurious features, and exploiting this observation for significantly improved generalization under distribution shifts.
  • The first LLM time-series forecasting model, and the discovery that text-based pre-training can lead to competitive zero-shot time-series forecasting.
  • Pioneering optimization techniques for protein and antibody engineering, with high expression and binding affinity to therapeutic targets in the wetlab.
  • The training of many wonderful students I couldn't be more proud of.