We are developing robust estimators for multivariate distributions which are both computationally efficient and near-optimal in terms of their accuracy. Our focus is on methods which are both theoretically sound and practically effective.
Agnostic Distribution learning has been extensively studied in statistics, machine learning and computer science. While such problems are fairly well understood in the low-dimensional setting, the high-dimensional setting is not. Existing estimators are either computationally intractable (as their runtime depends exponentially on the dimension) or only capable of handling vanishing amounts of noise (as their accuracy decays as the dimension increases). We present the first robust estimators for multivariate distributions which are simultaneously efficient and do not lose a dimension-dependent factor in the accuracy. Our methods allow us to robustly learn Gaussians, product distributions, certain mixtures of such distributions, and general distributions with bounded second moments. Our algorithms work under a very strong adversarial noise model, which significantly generalizes Huber's contamination model. We also implemented our algorithms and evaluated their performance on both synthetic and real-world datasets. Our algorithms are capable of running on data with hundreds of dimensions, and have higher accuracy than all alternatives.