Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

Ilias Diakonikolas; Daniel M. Kane; Daniel Kongsgaard; Jerry Li; Kevin Tian

Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

Ilias Diakonikolas ,
Daniel M. Kane ,
Daniel Kongsgaard ,
Jerry Li ,
Kevin Tian

STOC 2022 | June 2022

Download BibTex

We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset. Specifically, we are given a set \(T\) of \(n\) points in \(R^d\) and a parameter \(0<α<{1 \over 2}\) such that an \(α\)-fraction of the points in \(T\) are i.i.d. samples from a well-behaved distribution \(D\) and the remaining \((1−α)\)-fraction are arbitrary. The goal is to output a small list of vectors, at least one of which is close to the mean of \(D\). We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees, with running time \(O(n^{1+ϵ_0}d)\), for any fixed \(ϵ_0>0\). All prior algorithms for this problem had additional polynomial factors in \(1 \over α\). We leverage this result, together with additional techniques, to obtain the first almost-linear time algorithms for clustering mixtures of \(k\) separated well-behaved distributions, nearly-matching the statistical guarantees of spectral methods. Prior clustering algorithms inherently relied on an application of \(k\)-PCA, thereby incurring runtimes of \(Ω(ndk)\). This marks the first runtime improvement for this basic statistical problem in nearly two decades.

The starting point of our approach is a novel and simpler near-linear time robust mean estimation algorithm in the \(α→1\) regime, based on a one-shot matrix multiplicative weights-inspired potential decrease. We crucially leverage this new algorithmic framework in the context of the iterative multi-filtering technique of Diakonikolas et al. ’18, ’20, providing a method to simultaneously cluster and downsample points using one-dimensional projections — thus, bypassing the \(k\)-PCA subroutines required by prior algorithms.