Andrew Gordon Wilson

Code

Code repositories for group projects and close collaborations.

CoLA

We often wish to do fast matrix operations: matrix multiplies, eigendecompositions, log determinants, solving linear systems. These operations can sometimes be greatly accelerated through GPU parallelization, if we use iterative algorithms (such as Lanczos and CG). Moreover, quite often our modelling assumptions manifest as algebraic structure that can be exploited algorithmically for scalable computation. CoLA (Compositional Linear Algebra) is a framework for scalable linear algebra in machine learning and beyond, providing:
(1) Fast hardware-sensitive (GPU accelerated) iterative algorithms for general matrix operations;
(2) Algorithms that can exploit matrix structure for efficiency;
(3) A mechanism to rapidly prototype different matrix structures and compositions of structures.

CoLA natively supports PyTorch, JAX, as well as (limited) NumPy if JAX is not installed. See the paper, repo, and docs.

GPyTorch

A library that implements state-of-the-art scalable Gaussian processes in PyTorch. See the repo, website, and docs. Includes:
(1) SKI/KISS-GP [older but helpful tutorials in Matlab here]
(2) Deep Kernel Learning [older but helpful tutorials in Matlab here]
(3) Stochastic Variational Deep Kernel Learning
(4) Scalable Kernel Learning by Stochastic Lanczos Expansions
(5) Spectral Mixture Kernels [older but helpful tutorials in Matlab here]
(6) SKIP (scaling SKI/KISS-GP to higher dimensions)
(7) LOVE (Constant time predictive distributions)

GP Kernel Learning Tutorials

Tutorials for SKI/KISS-GP, Spectral Mixture Kernels, Kronecker Inference, and Deep Kernel Learning. The accompanying code is in Matlab and is now mostly out of date; the implementations in GPyTorch are typically much more efficient. However, the tutorial material and code is still very useful for anyone wanting to understand the building blocks and practical advice for SKI/KISS-GP, Spectral Mixture Kernels, or Kronecker Inference.

BoTorch

A modern library for efficient, modular, and easy-to-implement Bayesian optimization, based on Monte Carlo acquisition functions, written in PyTorch. See the paper, repo, website, and docs.

EMLP

The translation equivariance symmetry has enabled convolutional neural networks to provide good generalization on high-dimensional natural signals. Encode pretty much any symmetry you want with our equivariant MLP library. See the paper, repo, docs, and collab examples. Works in JAX, PyTorch, and other languages.

Stochastic Weight Averaging (SWA)

SWA is a simple DNN training method that can be used as a drop-in replacement for your favourite optimizers (SGD, Adam, etc.) with improved generalization, faster convergence, and essentially no overhead. Already trained your model? No problem, run SWA for a few epochs on a pre-trained model for an easy performance boost. SWA is now a core optimizer in the PyTorch library, so using it is as simple as calling optim.swa_utils, without needing an external repo. Resources: original paper, blog introducing SWA and its mainline PyTorch implementation, SWA in PyTorch docs, native PyTorch implementation, our original repo in PyTorch.

SWAG

A simple baseline for Bayesian uncertainty in deep learning, implemented in PyTorch. Forms a Gaussian approximate posterior over neural network parameters using optimization iterates, with a modified learning rate schedule. Scalable (works on ImageNet with modern architectures!) and improves generalization accuracy and calibration.

MultiSWAG

Inspired by a Bayesian perspective of model construction and deep ensembles, MultiSWAG forms multiple SWAG approximations, for improved performance over deep ensembles, without additional training time. If you are willing to train a few models, I recommend MultiSWAG over SWAG.

SWALP

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. Stochastic Weight Averaging in Low-Precision Training (SWALP) is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including gradient accumulators. As hardware accelerators develop, SWALP just keeps getting faster. The above implementation is in PyTorch. There's also a QPyTorch implementation.

fast-SWA and Semi-Supervised Learning

Provides a PyTorch implementation of fast-SWA and the record breaking semi-supervised results in There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average.

Subspace Inference

Subspace Inference for Bayesian Deep Learning makes Bayesian deep learning tractable by greatly reducing the dimensionality of the parameter space, while retaining significant functional variability. Implemented in PyTorch.

LieConv

A method in PyTorch for building convolutional layers which are equivariant to a wide range of symmetry groups, such as translation, rotation, intensity, and scale. If you're wanting a method that can encode a variety of symmetries for modelling high-dimensional data, like images, then LieConv is likely a more practical option than the EMLP.

Learning Invariances in Neural Networks

Most symmetries, such as translation invariance, are hard-coded. However, we don't often know which symmetries are present in the data, or to what extent we want these symmetries. For example, as we rotate a '6' its label remains unchanged until it starts to look like a '9'. This PyTorch repository implements Augerino, a simple method for automatically Learning Invariances in Neural Networks.

Constrained Hamiltonian Neural Networks

By making constraints explicit, rather than working with implicit constraints in angular coordinates, constrained Hamiltonian neural networks (cHNNs) dramatically improve data efficiency, accuracy, and the ability to solve challenging new tasks. This PyTorch framework implements CHNNs and introduces a variety of challenging new benchmarks, including ChainPendulum, CoupledPendulum, MagnetPendulum, Gyroscope, and Rotor.

Cyclical Stochastic Gradient MCMC

Cyclical MCMC is a state-of-the-art MCMC procedure for Bayesian deep learning. cSGMCMC uses a cyclical learning rate schedule to encourage exploration of the sophisticated neural network posterior landscape. Implemented in PyTorch.

HMC for Bayesian Deep Learning

This repository contains a methods and checkpoints for high-fidelity Hamiltonian Monte Carlo samples from the posteriors of modern neural networks. Due to its significant computational expense, these resources are primarily to explore "What Are Bayesian Neural Networks Really Like?" The HMC samples are also the foundation for the NeurIPS 2021 competition on Approximate Inference in Bayesian Deep Learning.

Semi-Supervised Learning with Normalizing Flows
A PyTorch library to do semi-supervised deep learning using an exact coherent likelihood over both unlabelled and labelled data! The procedure uses a normalizing flow mapping to a latent mixture model. This method keeps getting better with time, as the design of invertible architectures keeps improving.

Simplex Ensembling

A PyTorch framework for training whole low loss simplexes of solutions, which can be ensembled for a method that pound-for-pound outperforms deep ensembles.

Bayesian Deep Learning under Covariate Shift

Despite their popularity on OOD tasks, it turns out there are dangers of Bayesian model averaging under covariate shift. This PyTorch library implements new priors in Bayesian deep learning to help provide robustness for OOD generalization.

Word2GM
Implements probabilistic Gaussian mixture word embeddings in Tensorflow.

BayesGAN
Implements the Bayesian GAN in Tensorflow.

Hierarchical Density Order Embeddings
Provides a Torch implementation of our ICLR 2018 paper. In this paper we learn hierarchical representations of concepts using encapsulation of probability densities.

Probabilistic FastText
Provides a C++ implementation of our ACL 2018 paper. In this paper we learn density embeddings that account for sub-word structure and multiple senses.

Gaussian Processes for Machine Learning

The iconic GPML toolbox, the official software accompaniment to the Gaussian processes for machine learning textbook. GPML includes native support for Spectral Mixture Kernels, Kronecker Inference, and SKI/KISS-GP. Tutorials for this material based on GPML can be found here.