A new theoretical framework for clustering

Speaker: Avrim Blum , Carnegie Mellon University
Date: April 6 2010
Time: 4:15PM to 5:15PM
Location: 32-155
Host: Scott Aaronson, CSAIL, MIT
Contact: Be, 3-6098, imbe@mit.edu
Relevant URL: >
> Theoretical treatments of clustering have generally been of two types:
> either on algorithms for (approximately) optimizing various
> distance-based objectives such as k-median, k-means, min-sum, etc, or
> on clustering under probabilistic "generative model" assumptions such
> as mixtures of Gaussian or related distributions.
>
> In this work we propose a new approach to analyzing the problem of
> clustering. Building on models used in computational learning theory,
> we consider the goal of approximately recovering an unknown target
> clustering from given pairwise similarity or distance information,
> without making generative-model assumptions about the data. Instead,
> we ask: what relations between the similarity/distance information and
> the desired clustering are sufficient to cluster well -- these
> relations taking the role of the "concept class" in learning theory.
> Within this framework, we then present a number of interesting
> graph-theoretic and game-theoretic properties that we show are
> sufficient for an algorithm to succeed.
>
> As one concrete application, we show that if our distances have the
> property that any 1.1-approximation to the k-median (or k-means or
> min-sum) objective would have error < epsilon wrt the target (an ostensible
> reason for *wanting* to approximate the objective to a 1.1 factor) then
> we can efficiently find a solution that is O(epsilon)-close to the
> target, even though achieving a 1.1 approximation to these objectives
> is NP-hard. (One can replace 1.1 with 1+alpha for any constant
> alpha>0). I will also mention some interesting extensions and open
> problems in this direction.
>
> This talk is based on work joint with Maria-Florina Balcan, Anupam
> Gupta, and Santosh Vempala.
See other events that are part of Theory Colloquium 2009/2010
See other events happening in April 2010