We present a fast algorithm for detecting and characterizing
a cloud of points that is concentrated around a curve in a D-dimensional
Euclidean plane, where D>=2. We have adapted this algorithm to analyze
both microarray expression data and data from ChIP-chip experiments. The
algorithm characterizes the cloud of points by detecting the underlying curve,
separating between a "stable" set of points around the curve and a set of
outliers and estimating the local variances of the stable set. It generalizes
to detecting and characterizing a cloud of points around a d-dimensional
Lipschitz graph, where 1 <= d < D. We establish various estimates for
its performance.
When working with microarray and ChIP-chip data, we use the algorithm for
both purposes of careful normalization and also for ranking and identifying
differentially expressed genes or enriched sites in ChIP-chip data. Our normalization
is necessary for special microarray data (mainly for Chip-chip data), where
the set of deviating genes (or DNA fragments) is large, unknown and whose
proportion to the whole data varies locally. Our identification algorithm
is both non-parametric and adaptive and therefore achieves a superior performance,
especially when having only few replicates.
(Joint work with Joseph McQuown and Bud Mishra).