Obtaining information from data by
mixing them well

Esteban Tabak, CIMSA central problem
in data mining is the assignment of a joint-probability distribution to
a set of variables, given a sample of independent joint
observations. With such distribution in hand, one can answer all kind
of questions about the variables. In particular, one can diagnose the
state of one variable when the others are observed, a problem of
great relevance in medicine (and in many other fields.)

This talk will describe a new methodology to perform such an asignment, developed in the context of diagnosing the state of a transplanted heart through the observation of the patient's gene expression in a microarray.

The central algorithm proposed performs this assignment by mapping the original variables onto a jointly-Gaussian set, which can be made independent through a principal component analysis. The map is built iteratively, through a series of steps that normalize the marginal distributions along a random set of orthogonal directions. These can be thought of as "mixing" steps, that paradoxically reveal detailed information in the data by mapping them to a Gaussian soup that has maximal entropy, and hence no information left at all.

This talk will describe a new methodology to perform such an asignment, developed in the context of diagnosing the state of a transplanted heart through the observation of the patient's gene expression in a microarray.

The central algorithm proposed performs this assignment by mapping the original variables onto a jointly-Gaussian set, which can be made independent through a principal component analysis. The map is built iteratively, through a series of steps that normalize the marginal distributions along a random set of orthogonal directions. These can be thought of as "mixing" steps, that paradoxically reveal detailed information in the data by mapping them to a Gaussian soup that has maximal entropy, and hence no information left at all.