Agnostic Distribution learning has been extensively studied in statistics, machine learning and computer science. While such problems are fairly well understood in the low-dimensional setting, the high-dimensional setting is not. Existing estimators are either computationally intractable (as their runtime depends exponentially on the dimension) or only capable of handling vanishing amounts of noise (as their accuracy decays as the dimension increases). We present the first robust estimators for multivariate distributions which are simultaneously efficient and do not lose a dimension-dependent factor in the accuracy. Our methods allow us to robustly learn Gaussians, product distributions, certain mixtures of such distributions, and general distributions with bounded second moments. Our algorithms work under a very strong adversarial noise model, which significantly generalizes Huber's contamination model. We also implemented our algorithms and evaluated their performance on both synthetic and real-world datasets. Our algorithms are capable of running on data with hundreds of dimensions, and have higher accuracy than all alternatives.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.