Cluster Analysis in High Dimensions: Robustness, Privacy, and Beyond

Speaker

Shyam Narayanan
MIT CSAIL

Host

Piotr Indyk
Abstract: Cluster analysis is perhaps one of the most important subfields in high-dimensional data analysis. Traditionally, cluster analysis focuses on partitioning data into closely related groups, such as k-means clustering and learning mixture models. However, one sometimes overlooked part is understanding the location and shape of a single cluster: this encompasses problems such as mean estimation and covariance estimation. In this thesis, we study various classic problems relating to both identifying several clusters and learning a single cluster, for high-dimensional data. Additionally, we consider various socially motivated constraints such as robustness, privacy, and explainability, and how they affect the complexity of these problems.

For the presentation, I will focus primarily on learning a single cluster, which connects deeply with natural parameter estimation and hypothesis testing questions. Moreover, I will focus on the relationship between robustness and differential privacy for these questions.

Thesis Committee:
Piotr Indyk (Advisor) - MIT
Sam Hopkins - MIT
Ronitt Rubinfeld - MIT
Jelani Nelson - UC Berkeley