May 10

Add to Calendar 2017-05-10 16:30:00 2017-05-10 17:30:00 America/New_York Building Probabilistic Structure into Massively Parameterized Models Scientific applications of machine learning typically involve the identification of interpretable structure from high-dimensionalobservations. It is often challenging, however, to balance the flexibility required for high dimensional problems against the parsimonious structure that helps us model physical reality. I view this challenge through the lense of semiparametric modeling, in which a massively-parameterized function approximator is coupled to a compact and interpretable probabilistic model. Of particular interest in this vein is the merging of deep neural networks with graphical models containing latent variables, which enables each component to play to its strengths. I will discuss several different classes of such models, and various applications, in areas such as astronomy, chemistry, neuroscience, and sports analytics.Ryan P. Adams is a research scientist at Google Brain and the leader of the Harvard Intelligent Probabilistic Systems group. He received his PhD in physics from Cambridge University, where he was a Gates Scholar under David J.C. MacKay before spending two years as a CIFAR Junior Fellow at the University of Toronto. He was an Assistant Professor of Computer Science at Harvard from 2011 to 2016, and will be joining the faculty at Princeton this summer. Ryan has received the Alfred P. Sloan Fellowship, the DARPA Young Faculty Award, and paper awards at ICML, UAI, and AISTATS. He was a co-founder of Whetlab, a startup acquired by Twitter in 2015. He was also the co-host of the Talking Machines podcast. 32-141 (Stata Center, 1st Floor)

April 26

Add to Calendar 2017-04-26 15:00:00 2017-04-26 16:00:00 America/New_York Robust Bayesian inference via coarsening Robust Bayesian inference via coarseningThe standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure, particularly when the data set is large. We introduce a simple, coherent approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the observed data exactly, one conditions on the event that the model generates data close to the observed data, with respect to a given statistical distance. When closeness is defined in terms of relative entropy, the resulting "coarsened posterior" can be approximated by simply raising the likelihood to a certain fractional power, making the method computationally efficient and easy to implement in practice. We illustrate with real and simulated data, and provide theoretical results. 32-G575

April 05

Learning Deep Unsupervised and Multimodal Models

Ruslan Salakhutdinov
Carnegie Mellon University (CMU)
Add to Calendar 2017-04-05 16:00:00 2017-04-05 17:15:00 America/New_York Learning Deep Unsupervised and Multimodal Models In this talk I will first introduce a broad class of unsupervised deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities and present the Reverse Annealed Importance Sampling Estimator (RAISE) for evaluating these deep generative models. Finally, I will discuss models that can generate natural language descriptions (captions) of images and generate images from captions using attention, as well as introduce multiplicative and fine-grained gating mechanisms with application to reading comprehension.Bio: Ruslan Salakhutdinov received his PhD in computer science from the University of Toronto in 2009. After spending two post-doctoral years at the Massachusetts Institute of Technology Artificial Intelligence Lab, he joined the University of Toronto as an Assistant Professor in the Departments of Statistics and Computer Science. In 2016 he joined the Machine Learning Department at Carnegie Mellon University as an Associate Professor. Ruslan's primary interests lie in deep learning, machine learning, and large-scale optimization. He is an Alfred P. Sloan Research Fellow, Microsoft Research Faculty Fellow, Canada Research Chair in Statistical Machine Learning, a recipient of the Early Researcher Award, Google Faculty Award, Nvidia's Pioneers of AI award, and is a Senior Fellow of the Canadian Institute for Advanced Research. 34-101

March 23

Add to Calendar 2017-03-23 15:00:00 2017-03-23 16:00:00 America/New_York Systems-Aware Optimization for Machine Learning at Scale Abstract: New computing systems have emerged in response to the increasing size and complexity of modern datasets. For best performance, machine learning methods must be designed to closely align with the underlying properties of these systems. In this talk, we illustrate the impact of systems-aware machine learning in the distributed setting, where communication remains the most significant bottleneck. We propose a general optimization framework, CoCoA, that uses local computation in a primal-dual setting to allow for a tunable, problem-specific communication scheme. Our resulting framework enjoys strong convergence guarantees and exhibits state-of-the-art empirical performance in the distributed setting. We demonstrate this performance with extensive experiments in Apache Spark, achieving speedups of up to 50x compared to leading distributed methods for common machine learning objectives.Bio: Virginia Smith is a 5th-year Ph.D. student in the EECS Department at UC Berkeley, where she works jointly with Michael I. Jordan and David Culler as a member of the AMPLab. Her research interests are in large-scale machine learning and optimization, with a particular interest in applications that relate to energy and sustainability. She is actively working to improve diversity in computer science, most recently by co-founding the Women in Technology Leadership Round Table (WiT). Virginia has won several awards and fellowships while at Berkeley, including the NSF Graduate Research Fellowship, Google Anita Borg Memorial Scholarship, NDSEG Fellowship, MLConf Industry Impact Award, and Berkeley's Tong Leong Lim Pre-Doctoral Prize. Seminar Room G575

March 22

Add to Calendar 2017-03-22 16:00:00 2017-03-22 17:00:00 America/New_York Ben Recht: Optimization Challenges in Deep Learning Abstract: When training large-scale deep neural networks for pattern recognition, hundreds of hours on clusters of GPUs are required to achieve state-of-the-art performance. Improved optimization algorithms could potentially enable faster industrial prototyping and make training contemporary models more accessible. In this talk, I will attempt to distill the key difficulties in optimizing large, deep neural networks for pattern recognition. In particular, I will emphasize that many of the popularized notions of what make these problems “hard” are not true impediments at all. I will show that it is not only easy to globally optimize neural networks, but that such global optimization remains easy when fitting completely random data.I will argue instead that the source of difficulty in deep learning is a lack of understanding of generalization. I will provide empirical evidence of high-dimensional function classes that are able to achieve state-of-the-art performance on several benchmarks without any obvious forms of regularization or capacity control. These findings reveal that traditional learning theory fails to explain why large neural networks generalize. I will close by discussing possible mechanisms to explain generalization in such large models, appealing to insights from linear predictors. Bio: Benjamin Recht is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Ben’s research focuses on scalable computational tools for large-scale data analysis, statistical signal processing, and machine learning. He is the recipient of a Presidential Early Career Awards for Scientists and Engineers, an Alfred P. Sloan Research Fellowship, the 2012 SIAM/MOS Lagrange Prize in Continuous Optimization, the 2014 Jamon Prize, and the 2015 William O. Baker Award for Initiatives in Research. 32-D463

March 15

Add to Calendar 2017-03-15 16:00:00 2017-03-15 17:00:00 America/New_York Measuring Sample Quality with Kernels Abstract:Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernel evaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein's method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement. 32-D463

March 08

Add to Calendar 2017-03-08 16:00:00 2017-03-08 17:00:00 America/New_York Online Learning for Time Series Prediction Abstract:Online learning is a rich and fast-growing literature with algorithmsbenefitting from regret guarantees that are often tight. Can weleverage the online learning theory and algorithms to devise accuratesolutions for time series prediction in the stochastic setting?This talk presents a series of theoretical and algorithmic solutionsaddressing this question. It further shows how some notoriouslydifficult time series problems such as model selection and ensemblelearning can be tackled using these ideas.(joint work with Vitaly Kuznetsov) 32-D463