Cluster Analysis in High Dimensions: Robustness, Privacy, and Beyond

Speaker

Shyam Narayanan

MIT CSAIL

Host

Piotr Indyk

Abstract: Cluster analysis is perhaps one of the most important subfields in high-dimensional data analysis. Traditionally, cluster analysis focuses on partitioning data into closely related groups, such as k-means clustering and learning mixture models. However, one sometimes overlooked part is understanding the location and shape of a single cluster: this encompasses problems such as mean estimation and covariance estimation. In this thesis, we study various classic problems relating to both identifying several clusters and learning a single cluster, for high-dimensional data. Additionally, we consider various socially motivated constraints such as robustness, privacy, and explainability, and how they affect the complexity of these problems.

For the presentation, I will focus primarily on learning a single cluster, which connects deeply with natural parameter estimation and hypothesis testing questions. Moreover, I will focus on the relationship between robustness and differential privacy for these questions.

Thesis Committee:
Piotr Indyk (Advisor) - MIT
Sam Hopkins - MIT
Ronitt Rubinfeld - MIT
Jelani Nelson - UC Berkeley

Add to Calendar 2024-04-25 14:30:00 2024-04-25 15:30:00 America/New_York Cluster Analysis in High Dimensions: Robustness, Privacy, and Beyond Abstract: Cluster analysis is perhaps one of the most important subfields in high-dimensional data analysis. Traditionally, cluster analysis focuses on partitioning data into closely related groups, such as k-means clustering and learning mixture models. However, one sometimes overlooked part is understanding the location and shape of a single cluster: this encompasses problems such as mean estimation and covariance estimation. In this thesis, we study various classic problems relating to both identifying several clusters and learning a single cluster, for high-dimensional data. Additionally, we consider various socially motivated constraints such as robustness, privacy, and explainability, and how they affect the complexity of these problems.For the presentation, I will focus primarily on learning a single cluster, which connects deeply with natural parameter estimation and hypothesis testing questions. Moreover, I will focus on the relationship between robustness and differential privacy for these questions.Thesis Committee: Piotr Indyk (Advisor) - MITSam Hopkins - MITRonitt Rubinfeld - MITJelani Nelson - UC Berkeley 32-G449

Organizer & Contact

Joanne Talbot Hanley

joanne@csail.mit.edu

617-253-6054

Cluster Analysis in High Dimensions: Robustness, Privacy, and Beyond

Speaker

Host

April 25 2024

Location

Organizer & Contact