THESIS DEFENSE: Scalable Methodologies for Optimizing Over Probability Distributions

Speaker

Lingxiao Li
CSAIL MIT

Host

Justin Solomon
CSAIL MIT
ZOOM LINK:: https://mit.zoom.us/j/5605527737?pwd=QmxvNXU0UEx0YlFPVFYvdzlubDByZz09


Abstract: Modern machine learning applications, such as generative modeling and probabilistic inference, demand a new generation of methodologies for optimizing over the space of probability distributions, where the optimization variable represents a weighted population of potentially infinitely many points. Despite the ubiquity of these distributional optimization problems, there has been a shortage of scalable methods grounded in mathematical principles. To bridge this gap, this thesis introduces two complementary lines of works for scalable distributional optimization.

The first part of this thesis focuses on optimizing over discrete distributions to generate high-quality samples for probabilistic inference. We present two works that tackle sampling by optimizing pairwise interaction energies defined on a collection of particles. The first work focuses on designing a new family of mollified interaction energies over moving particles, offering a unified framework for constrained and unconstrained sampling. The second work focuses on the scalable optimization of a family of popular interaction energies---maximum mean discrepancy of mean-zero kernels---to generate high-quality coresets from millions of biased samples, obtaining better-than-i.i.d. unbiased coresets.

The second part transitions to optimizing over continuous distributions through neural network parameterization, enabling the generation of endless streams of samples once optimized. We tackle the challenges of identifying suitable mathematical formulations and devising scalable optimization algorithms in three contexts: 1) averaging distributions in a geometric meaningful manner using a regularized Wasserstein barycenter dual formulation; 2) identifying local minima of non-convex optimization as a generative model by learning proximal operators with global convergence guarantees; and 3) solving mass-conserving differential equations of probability flows without temporal or spatial discretization by leveraging the self-consistency of the dynamical system.

Bio: Lingxiao Li is a final-year Ph.D. student at MIT advised by Justin Solomon. Previously, he obtained a Master's degree in Mathematics and Bachelor's degrees in Computer Science and Mathematics, all at Stanford University. During his PhD, he has done internships at Adobe Research with Noam Aigerman and Vova Kim, and at Microsoft Research with Lester Mackey.

Committee: Justin Solomon (MIT), Ashia Wilson (MIT), Lester Mackey (Microsoft Research)