Projection Pursuit, Gaussian Scale Mixtures, and the EM Algorithm

Speaker: Sanjoy Dasgupta , University of California, San Diego
Date: November 9 2006
Time: 4:00PM to 5:00PM
Location: 32-G475A Stata Center - 4th Floor Gates Lounge
Host: Regina Barzilay, MIT CSAIL
Contact: Marcia Davidson, 617-253-2448, marcia@csail.mit.edu
Relevant URL: The EM algorithm for fitting Gaussian mixture models is one of the most widely-used clustering methods. Yet, surprisingly little is known about its behavior. For instance, there are many different ways to initialize EM, and to merge/remove intermediate clusters, and the effects of these different strategies are not understood in a principled way.
I'll describe a new probabilistic analysis of EM. First of all, it will emerge that many common methods of initializing and running EM produce hopelessly suboptimal results even in ideal clustering scenarios. On the other hand, a particular variant of EM will provably recover a near-optimal clustering, provided that the clusters are adequately separated and that their distributions are of a certain fairly general form.
The type of cluster distributions allowed is motivated by a new result in projection pursuit, along the lines of the folklore that "all projections of high-dimensional data look Gaussian". Specifically, I'll show that for any D-dimensional distribution with finite second moments, there is a precise sense in which almost all of its linear projections into d < D dimensions look like a scale-mixture of spherical Gaussians (concentric spherical Gaussians with the same center). The extent of this effect depends upon the ratio of d to D, and upon the "eccentricity" of the high-dimensional distribution.
Sanjoy Dasgupta is an Assistant Professor in the Department of Computer Science and Engineering at UC San Diego. He received his PhD from Berkeley, and spent two years at AT&T Research Labs before joining UCSD. His area of research is algorithmic statistics, with a focus on unsupervised learning. He has just completed a textbook, "Algorithms" (with Christos Papadimitriou and Umesh Vazirani).
See other events that are part of Language, Learning, Vision and Graphics Seminar Series (LLVG) 2006/2007
See other events happening in November 2006