and ChIP-chip Data

Wednesday, April 28, 2004, 2-3:00pm, WWH 1314

We present a fast algorithm for detecting and characterizing a cloud of points that is concentrated around a curve in a D-dimensional Euclidean plane, where D>=2. We have adapted this algorithm to analyze both microarray expression data and data from ChIP-chip experiments. The algorithm characterizes the cloud of points by detecting the underlying curve, separating between a "stable" set of points around the curve and a set of outliers and estimating the local variances of the stable set. It generalizes to detecting and characterizing a cloud of points around a d-dimensional Lipschitz graph, where 1 <= d < D. We establish various estimates for its performance.

When working with microarray and ChIP-chip data, we use the algorithm for both purposes of careful normalization and also for ranking and identifying differentially expressed genes or enriched sites in ChIP-chip data. Our normalization is necessary for special microarray data (mainly for Chip-chip data), where the set of deviating genes (or DNA fragments) is large, unknown and whose proportion to the whole data varies locally. Our identification algorithm is both non-parametric and adaptive and therefore achieves a superior performance, especially when having only few replicates.

(Joint work with Joseph McQuown and Bud Mishra).