Using evolutionary sequence variation to make inferences about
protein structure and function
Lucy Colwell, Cambridge
Abstract:
The evolutionary trajectory of a protein through sequence space is
constrained by its function. Collections of sequence homologs record
the outcomes of millions of evolutionary experiments in which the
protein evolves according to these constraints. The explosive growth
in the number of available protein sequences raises the possibility
of using the natural variation present in homologous protein
sequences to infer these constraints and thus identify residues that
control different protein phenotypes. Because in many cases
phenotypic changes are controlled by more than one amino acid, the
mutations that separate one phenotype from another may not be
independent, requiring us to understand the correlation structure of
the data. To address this we build a maximum entropy model of the
protein sequence, constrained by the statistics of a large sequence
alignment. Using this model, we infer residue pair interactions,
which accurately predict residues in close structural proximity in
protein tertiary structure. These predictions are used to generate
all atom structural models. We then apply our method to predict de
novo the structure of 11 medically important transmembrane proteins
of unknown structure. In addition we are able to predict protein
quaternary structure and alternative conformations. The next step
requires development of a theoretical inference framework that
enables the relationship between the amount of available input data
and the reliability of structural predictions to be better
understood.