Learning $k$-Modal Distributions via Testing

Daskalakis, Constantinos; Diakonikolas, Ilias; Servedio, Rocco A.

doi:10.4086/toc.2014.v010a020

Volume 10 (2014) Article 20 pp. 535-570

Learning $k$-Modal Distributions via Testing

by Constantinos Daskalakis, Ilias Diakonikolas, and Rocco A. Servedio

Received: April 11, 2013
Revised: November 10, 2014
Published: December 31, 2014

Download article from ToC site:
[PDF (394K)] [PS (1748K)] [Source ZIP]

Misc.:
[About the Authors] [HTML Bibliography] [BibTeX]

Keywords: computational learning theory, learning distributions, $k$-modal distributions

Categories: learning, learning distributions

ACM Classification: F.2.2, G.3

AMS Classification: 68W20, 68Q25, 68Q32

Abstract: [Plain Text Version]

$ \newcommand{\eps}{\epsilon} \newcommand{\poly}{\mathrm{poly}} \newcommand{\wh}[1]{{\widehat{#1}}} $

A $k$-modal probability distribution over the discrete domain $\{1,...,n\}$ is one whose histogram has at most $k$ “peaks” and “valleys.” Such distributions are natural generalizations of monotone ($k=0$) and unimodal ($k=1$) probability distributions, which have been intensively studied in probability theory and statistics.

In this paper we consider the problem of learning (i.e., performing density estimation of) an unknown $k$-modal distribution with respect to the $L_1$ distance. The learning algorithm is given access to independent samples drawn from an unknown $k$-modal distribution $p$, and it must output a hypothesis distribution $\widehat{p}$ such that with high probability the total variation distance between $p$ and $\widehat{p}$ is at most $\eps.$ Our main goal is to obtain computationally efficient algorithms for this problem that use (close to) an information-theoretically optimal number of samples.

We give an efficient algorithm for this problem that runs in time $\poly(k,\log(n),1/\eps)$. For $k \leq \tilde{O}( {\log n})$, the number of samples used by our algorithm is very close (within an $\tilde{O}(\log(1/\eps))$ factor) to being information-theoretically optimal. Prior to this work computationally efficient algorithms were known only for the cases $k=0,1$ (Birgé 1987, 1997).

A novel feature of our approach is that our learning algorithm crucially uses a new algorithm for property testing of probability distributions as a key subroutine. The learning algorithm uses the property tester to efficiently decompose the $k$-modal distribution into $k$ (near-)monotone distributions, which are easier to learn.

A preliminary version of this work appeared in the Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2012).

DOI: 10.4086/toc.2014.v010a020

Licensed under a Creative Commons Attribution License (CC-BY)