Modeling Rich Structured Data via Kernel Distribution Embeddings
Speaker: Le Song , Carnegie Mellon UniversityContact:
Date: April 4 2011
Time: 4:00PM to 5:00PM
Host: Regina Barzilay and Leslie Kaelbling , CSAIL
Francis Doughty, 253-4602, email@example.com
Real world applications often produce a large volume of highly uncertain and complex data. Many of them have rich microscopic structures where each variable can take values on manifolds (e.g., camera rotations), combinatorial objects (e.g., texts, graphs of drug compounds) or high dimensional continuous domains (e.g., images and videos). Furthermore, these problems may possess additional macroscopic structures where the large collections of observed and hidden variables are connected by networks of conditional independence relations (e.g., in predicting depth from still images, and forecasting in time-series).
Most previous learning algorithms for problems with such rich structures rely heavily on linear relations and parametric models where data are typically assumed to be multivariate Gaussian or discrete with a relatively small number of values. Conclusions inferred under these restricted assumptions can be misleading, if the underlying data generating processes contain nonlinear, non-discrete, or non-Gaussian components.
How can we find a suitable representation for nonlinear and non-Gaussian relationships in a data-driven fashion? How can we exploit conditional independence structures between variables in rich structured setting? How can we design efficient algorithms to solve challenging nonparametric problems involving large amount of data?
In this talk, I will introduce a nonparametric representation for distributions called kernel embeddings to address these questions. The key idea of the method is to map distributions to their expected features (potentially infinite dimensional), and given evidence, update these new representations solely in the feature space. Compared to existing nonparametric representations which are largely restricted to vectorial data and usually lead to intractable algorithms, very often kernel distribution embeddings lead to simpler, faster and more accurate algorithms in a diverse range of problems such as organizing photo albums, understanding social networks, retrieving documents across languages, predicting depth from still images and forecasting sensor time-series.
BIO: Le Song is a post doctoral fellow at School of Computer Science, Carnegie Mellon University, working with a number of professors, including Eric Xing, Carlos Guestrin, Geoff Gordon and Jeff Schneider. Prior to that, Le Song obtained his PhD. degree in computer science from University of Sydney and National ICT Australia in 2008 under the supervision of Alex Smola. Le Song conducted research in statistical machine learning and data mining, with primary focus on kernel methods, probabilistic graphical models, and network analysis. He is also interested in large-scale machine learning problems, and machine learning applications in computational biological, texts, images, and network analysis.
See other events that are part of CS Special Seminar Series 2010/2011
See other events happening in April 2011