ORIE 6741: Bayesian Machine Learning

Bayesian Machine Learning
ORIE 6741

Fall 2016

Course Information

Title: Bayesian Machine Learning
Course Number: ORIE 6741
Semester: Fall 2016
Times: Tu/Th 11:40 am - 12:55 pm
Room: Hollister Hall 320
Course Syllabus [PDF]

Instructor

Andrew Gordon Wilson
Assistant Professor
Rhodes Hall 235
Website: https://people.orie.cornell.edu/andrew
E-Mail: andrew@cornell.edu
Office Hours: Tuesday 4:00 pm - 5:00 pm, or by appointment

Overview

To answer scientific questions, and reason about data, we must build models and perform inference within those models. But how should we approach model construction and inference to make the most successful predictions? How do we represent uncertainty and prior knowledge? How flexible should our models be? Should we use a single model, or multiple different models? Should we follow a different procedure depending on how much data are available?

In this course, we will approach these fundamental questions from a Bayesian perspective. From this perspective, we wish to faithfully incorporate all of our beliefs into a model, and represent uncertainty over these beliefs, using probability distributions. Typically, we believe the real world is in a sense infinitely complex: we will always be able to add flexibility to a model to gain better performance. If we are performing character recognition, for instance, we can always account for some additional writing styles for greater predictive success. We should therefore aim to maximize flexibility, so that we are capable of expressing any hypothesis we believe to be possible. For inference, we will not have a priori certainty that any one hypothesis has generated our observations. We therefore typically wish to weight an uncountably infinite space of hypotheses by their posterior probabilities. This Bayesian model averaging procedure has no risk of overfitting, no how matter how flexible our model. How we distribute our a priori support over these different hypotheses determines our inductive biases. In short, a model should distribute its support across as wide a range of hypotheses as possible, and have inductive biases which are aligned to particular applications.

This course aims to provide students with a strong grasp of the fundamental principles underlying Bayesian model construction and inference. We will go into particular depth on Gaussian process and deep learning models.

The course will be comprised of three units:

Model Construction and Inference: Parametric models, support, inductive biases, gradient descent, sum and product rules, graphical models, exact inference, approximate inference (Laplace approximation, variational methods, MCMC), model selection and hypothesis testing, Occam's razor, non-parametric models.

Gaussian Processes: From finite basis expansions to infinite bases, kernels, function space modelling, marginal likelihood, non-Gaussian likelihoods, Bayesian optimisation.

Bayesian Deep Learning: Feed-forward, convolutional, recurrent, and LSTM networks.

Depending on the available time, we may omit some of these topics. Most of the material will be derived on the chalkboard, with some supplemental slides. The course will have both theoretical and practical (e.g. coding) aspects.

After taking this course, you should:
- Be able to think about any problem from a Bayesian perspective.
- Be able to create models with a high degree of flexibility and appropriate inductive biases.
- Understand the interplay between model specification and inference, and be able to construct a successful inference algorithm for a given model.
- Have familiarity with Gaussian process and deep learning models.

Announcements

Schedule