November 28

Add to Calendar 2018-11-28 16:00:00 2018-11-28 17:00:00 America/New_York Optimal Algorithms for Continuous Non-monotone Submodular Maximization Abstract:In this talk, I will explain our recent result on designing optimal approximation algorithms for maximizing continuous non-monotone submodular functions over the hypercube. This family of optimization problems has several applications in machine learning, finance and social network analysis. Our main result is the first 1/2-approximation algorithm for this problem; this approximation factor is the best possible for algorithms that only query the objective function at polynomially many points. For the special case of DR-submodular maximization, i.e., when the submodular functions is also coordinate wise concave along all coordinates, we provide a different 1/2-approximation algorithm that runs in quasilinear time in dimension. Both of these results improve upon prior work [Bian et al, 2017, Buchbinder et al, 2012].Our first algorithm uses novel ideas such as reducing the guaranteed approximation problem to analyzing a stylized zero-sum game for each coordinate, and incorporates the geometry of this zero-sum game to find a value for this coordinate. Our second algorithm exploits coordinate-wise concavity to identify a monotone equilibrium condition sufficient for getting the required approximation guarantee, and hunts for the equilibrium point using binary search.The talk is based on joint work with Tim Roughgarden and Joshua Wang, accepted for an oral presentation at NIPS'18.Bio:Rad Niazadeh is a Motwani postdoctoral fellow at Stanford University, Department of Computer Science, where he is hosted by Tim Roughgarden, Amin Saberi and Moses Charikar. Prior to Stanford, he obtained his Ph.D. in Computer Science from Cornell University under Bobby Kleinberg. During his graduate studies, he was a research intern at Microsoft Research (Redmond), Microsoft Research (New England) and Yahoo! Research. He also has been awarded the Google PhD fellowship (in market algorithms), INFORMS Revenue Management and Pricing Dissertation Honorable Mention(runner up), Stanford Motwani fellowship and Cornell Irwin Jacobs fellowship. His research interests are broadly at the intersection of algorithms, game theory and optimization, with a focus on applications in market design, machine learning, and operations research. 32-D463 (Stata Center, Star Conference Room)

November 14

Add to Calendar 2018-11-14 16:30:00 2018-11-14 17:30:00 America/New_York Is Learning Compatible with (Over)fitting to the Training Data? Abstract:We revisit the basic question: can a learning method be successful if it perfectly fits (interpolates/memorizes) the data? The question is motivated by the good out-of-sample performance of ``overparametrized'' deep neural networks that have the capacity to fit training data exactly, even if labels are randomized. The conventional wisdom in Statistics and ML is to regularize the solution and avoid data interpolation. We challenge this wisdom and propose several interpolation methods that work well, both in theory and in practice. In particular, we present a study of kernel ``ridgeless'' regression and describe a new phenomenon of implicit regularization, even in the absence of explicit bias-variance trade-off. We will discuss the nature of successful learning with interpolation, both in regression and classification.Bio: Alexander (Sasha) Rakhlin is an Associate Professor at MIT in the Department of Brain and Cognitive Sciences and the Statistics & Data Science Center. Sasha's research is in Statistics, Machine Learning, and Optimization. He received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an associate professor in the Department of Statistics and a co-director of the Penn Research in Machine Learning (PRiML) center. He is a recipient of the NSF CAREER award, IBM Research Best Paper award, Machine Learning Journal award, and COLT Best Paper Award. 32-155

October 31

Add to Calendar 2018-10-31 16:30:00 2018-10-31 17:30:00 America/New_York Towards Better Reinforcement Learning for High Stakes Domains Abstract:There is increasing excitement about reinforcement learning -- a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. One key question is how to leverage the ever expanding logs of sequences of decisions made and their outcomes to identify better decision policies, such as using electronic medical record data to inform better personalized patient treatments. I will discuss some of our work on developing better statistical estimators and optimizers for this problem, from the vantage point of a reinforcement learning researcher. A second key issue is to narrow the gap between RL theory and experiment, to create online reinforcement learning algorithms with strong guarantees on their performance. In this effort, I will briefly describe our recent work on policy certificates for RL algorithms. This work is a step towards providing the types of guarantees and transparency that will enable us to confidently deploy RL in important high stakes scenarios.BioEmma Brunskill is an assistant professor in the Computer Science Department at Stanford University where she leads the AI for Human Impact (@ai4hi) group. Her work focuses on reinforcement learning in high stakes scenarios-- how can an agent learn from experience to make good decisions when experience is costly or risky, such as in educational software, healthcare decision making, robotics or people-facing applications. She was previously on faculty at Carnegie Mellon University. She is the recipient of a multiple early faculty career awards (National Science Foundation, Office of Naval Research, Microsoft Research) and her group has received several best research paper nominations (CHI, EDMx2) and awards (UAI, RLDM). 34-101

October 17

Add to Calendar 2018-10-17 16:30:00 2018-10-17 17:30:00 America/New_York Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes Abstract:We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model. (Joint work with Loucas Pillaud-Vivien and Alessandro Rudi)Bio:Francis Bach is a researcher at Inria, leading since 2011 the machine learning team which is part of the Computer Science Department at Ecole Normale Supérieure. He graduated from Ecole Polytechnique in 1997 and completed his Ph.D. in Computer Science at U.C. Berkeley in 2005, working with Professor Michael Jordan. He spent two years in the Mathematical Morphology group at Ecole des Mines de Paris, then he joined the computer vision project-team at Inria/Ecole Normale Supérieure from 2007 to 2010. Francis Bach is primarily interested in machine learning, and especially in graphical models, sparse methods, kernel-based learning, large-scale convex optimization, computer vision and signal processing. He obtained in 2009 a Starting Grant and in 2016 a Consolidator Grant from the European Research Council, and received the Inria young researcher prize in 2012, the ICML test-of-time award in 2014, as well as the Lagrange prize in continuous optimization in 2018. In 2015, he was program co-chair of the International Conference in Machine learning (ICML), and general chair in 2018; he is now co-editor-in-chief of the Journal of Machine Learning Research. 6-120

October 10

Is Q-learning Provably Efficient?

Chi Jin
University of California, Berkeley
Add to Calendar 2018-10-10 16:00:00 2018-10-10 17:00:00 America/New_York Is Q-learning Provably Efficient? Abstract: Have you ever wondered how many samples Q-learning needs to learn a good policy? Is epsilon-greedy a good exploration strategy for reinforcement learning, or is there a better alternative? Although these questions are fundamental, they are not well understood even in the basic scenario with finitely many states and actions. In this talk, I will present our recent effort to answer both of the above questions with a provably sample-efficient version of Q-learning. This paper won the best paper award in ICML 2018 workshop "Exploration in RL".Bio: Chi Jin is a Ph.D. candidate in Computer Science at UC Berkeley, advised by Michael Jordan. He received B.S. in Physics from Peking University in 2012. His research primarily focuses on learning problems and optimization algorithms under non-convex setting. His representative works include guarantees for gradient descent / accelerated gradient descent to efficiently escape saddle points, and the optimization landscape of low-rank problems. His is also recently interested in reinforcement learning problems. 32-G882 (Stata Center - Hewlett Room)

September 19

Add to Calendar 2018-09-19 16:00:00 2018-09-19 17:00:00 America/New_York Adversaries, Extrapolation, and Language Abstract:Despite their excellent performance on benchmarks, state-of-the-art machine learning systems generalize poorly out of domain and are vulnerable to adversarial examples. We show this on the SQuAD question answering benchmark and suggest two general directions for improvement. First, working in an unsupervised learning setting can promote the development of models with better inductive biases. Specifically, we show how to learn an end-to-end neural network based on message passing that can solve SAT instances given only instances labeled with whether they have a solution. Second, we show how to use natural language as a means of providing stronger supervision. In one project, we convert natural language explanations to a function that labels unlabeled data, which can be used to train a predictor. In another, users interactively teach high-level concepts using natural language definitions. While being far from a complete solution, we hope that these vignettes are suggestive of broader ideas worth exploring.Bio:Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014). 32-D463 (Stata Center - Star Conference Room)

September 12

Add to Calendar 2018-09-12 15:30:00 2018-09-12 16:30:00 America/New_York When Recurrent Models Don't Need To Be Recurrent Abstract: We show that stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Our result applies to a broad range of non-linear recurrent neural networks under a natural stability condition, which we observe is also necessary. Complementing our theoretical findings, we verify the conclusions of our theory on both real and synthetic tasks. Furthermore, we demonstrate recurrent models satisfying the stability assumption of our theory can have excellent performance on real sequence learning tasks.Joint work with John Miller (UC Berkeley).Short bio:Moritz Hardt is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research aims to make the practice of machine learning more robust, reliable, and aligned with societal values. After obtaining a PhD in Computer Science from Princeton University in 2011, Hardt was a postdoctoral scholar and research staff member at IBM Research Almaden, followed by two years as a research scientist at Google Research and Google Brain. 32-D463 (Stata Center, Star Conference Room)

June 05

Add to Calendar 2018-06-05 14:00:00 2018-06-05 15:00:00 America/New_York Noisy Natural Gradient as Variational Inference Abstract:Variational Bayesian neural nets combine the flexibility of deep learning with Bayesian uncertainty estimation. Unfortunately, there is a tradeoff between cheap but simple variational families (e.g. fully factorized) or expensive and complicated inference procedures. We show that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO). This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern conv nets. On standard regression benchmarks, our noisy K-FAC algorithm makes better predictions and matches Hamiltonian Monte Carlo’s predictive variances better than existing methods. Its improved uncertainty estimates lead to more efficient exploration in active learning and reinforcement learning with sparse rewards.Bio: Roger Grosse is an Assistant Professor of Computer Science at the University of Toronto and the Vector Institute. Previously, he received his Ph.D. in Computer Science from MIT, studying with Bill Freeman and Josh Tenenbaum, and then was a Postdoctoral Fellow at the University of Toronto, working with Ruslan Salakhutdinov. He holds a Canada Research Chair in Probabilistic Inference and Deep Learning. He is also the co-creator of Metacademy, a web site for automatically generating personalized learning plans. 32-G882 (Stata Center - 8th Floor Hewlett Room)

May 09

Add to Calendar 2018-05-09 16:30:00 2018-05-09 17:30:00 America/New_York Reproducibility in Deep Reinforcement Learning and Beyond Abstract:In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this talk, I will discuss challenges posed by reproducibility, proper experimental techniques, and reporting procedures. I will illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. I will also comment on findings from the ICLR 2018 reproducibility challenge.Bio:Joelle Pineau is an Associate Professor and William Dawson Scholar at McGill University where she co-directs the Reasoning and Learning Lab. She is also the director of the Facebook Artificial Intelligence Research Lab in Montreal. Dr. Pineau’s research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is currently President of the International Machine Learning Society. She is a AAAI Fellow, a Senior Fellow of CIFAR, and a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada. 32-141 (Stata Center, 1st Floor)

May 07

Responsibility Sensitive Safety of Self-Driving Cars

Shai Shalev-Shwartz
Hebrew University of Jerusalem, Mobileye
Add to Calendar 2018-05-07 15:00:00 2018-05-07 16:00:00 America/New_York Responsibility Sensitive Safety of Self-Driving Cars Abstract: In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of the talk is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a "winter of autonomous driving". In the first part of the talk I will show why statistical guarantees give a very weak safety and will propose instead a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). Time permitting, the second part of the talk will involve reinforcement learning techniques for scalable driving policy.Joint work with Shaked Shammah and Amnon Shashua.Bio:Shai Shalev-Shwartz is a VP Technology at Mobileye, a Senior Fellow at Intel, and a professor in the Faculty of Computer Science and Engineering at the Hebrew University of Jerusalem. Prof. Shalev-Shwartz is the author of the book “Online Learning and Online Convex Optimization,” and a co-author of the book “Understanding Machine Learning: From Theory to Algorithms”. His research is around the theoretical foundations of machine learning as well as crafting theoretically justified optimization algorithms that make learning more efficient. He received several best paper awards, was listed in the Aminer top 100 most influential scholar list of 2016, and was listed at the 17'th place in The-Marker's top 100 most influential people in Israel, under the title “the brain behind artificial intelligence”, acknowledging his contributions to Mobileye's technology. 32-G449 (Stata Center - Patil/Kiva Conference Room)

May 02

Kernel Methods for Representing Probabilities

Arthur Gretton
Gatsby Computational Neuroscience Unit, University College London
Add to Calendar 2018-05-02 16:00:00 2018-05-02 17:00:00 America/New_York Kernel Methods for Representing Probabilities Abstract: I will provide an introduction to kernel tools developed to represent and compare probability distributions, and to generate samples with properties that match a reference sample. I'll begin by defining the Maximum Mean Discrepancy (MMD), which is a measure of similarity between kernel probability representations. The MMD may be defined both as a distance between features, and as an integral probability metric (similar to a Wasserstein distance). Since each kernel defines a distance measure, the kernel can be chosen to optimise performance on a task, such as hypothesis testing. The MMD may also be used as a critic in a generative adversarial network, where it provides a measure of similarity between the generated and reference samples: the generator attempts to reduce the MMD by generating more plausible samples, while the MMD is made more sensitive through adaptive feature/kernel choice. Time permitting, I'll discuss additional applications of kernel distribution representations, such as model criticism and measuring statistical dependence.Bio: Arthur Gretton is a Professor with the Gatsby Computational Neuroscience Unit, UCL. He received degrees in physics and systems engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing Laboratory at the University of Cambridge. He previously worked at the MPI for Biological Cybernetics and at the Machine Learning Department, Carnegie Mellon University. Arthur's research interests include machine learning, kernel methods, statistical learning theory, nonparametric hypothesis testing, blind source separation, and non-parametric techniques for neural data analysis. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, a member of the NIPS Program Committee in 2008 and 2009, a Senior Area Chair for NIPS in 2018, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013. Arthur was co-chair of AISTATS in 2016 (with Christian Robert), and co-tutorials chair of ICML in 2018 (with Ruslan Salakhutdinov). 32-G882 (Hewlett Room, 8th Floor, Stata Center)

April 25

Add to Calendar 2018-04-25 16:30:00 2018-04-25 17:30:00 America/New_York Nonvacuous Generalization Bounds for Deep Neural Networks via PAC-Bayes Abstract:A serious impediment to a rigorous understanding of the generalization performance of algorithms like SGD for neural networks is that most generalization bounds are numerically vacuous when applied to modern networks on real data sets. In recent work (Dziugaite and Roy, UAI 2017), we argue that it is time to revisit the problem of computing nonvacuous bounds, and show how the empirical phenomenon of "flat minima" can be operationalized using PAC-Bayesian bounds, yielding the first nonvacuous bounds for a large (stochastic) neural network on MNIST. The bound is obtained by first running SGD and then optimizing the distribution of a random perturbation of the weights so as to capture the flatness and minimize the PAC-Bayes bound. I will describe this work, its antecedents, its goals, and subsequent work, focusing on where others have and have not made progress towards understanding generalization according to our strict criteria.Joint work with Gintare Karolina Dziugaite based on https://arxiv.org/abs/1703.11008, https://arxiv.org/abs/1712.09376, and https://arxiv.org/abs/1802.09583Bio:Daniel Roy is an Assistant Professor in the Department of Statistical Sciences and, by courtesy, Computer Science at the University of Toronto, and a founding faculty member of the Vector Institute for Artificial Intelligence. Daniel is a recent recipient of an Ontario Early Researcher Award and Google Faculty Research Award. Before joining U of T, Daniel held a Newton International Fellowship from the Royal Academy of Engineering and a Research Fellowship at Emmanuel College, University of Cambridge. Daniel earned his S.B., M.Eng., and Ph.D. from the Massachusetts Institute of Technology: his dissertation on probabilistic programming won an MIT EECS Sprowls Dissertation Award. Daniel's group works on foundations of machine learning and statistics. 32-141 (Stata Center, 1st floor)

April 18

Add to Calendar 2018-04-18 15:00:00 2018-04-18 16:00:00 America/New_York Using Interaction for Simpler and Better Learning Abstract:In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process.We will see how interaction makes it possible to:1. Learn DNF (disjunctive normal form) concepts.2. Perform machine teaching in situations where the student’s concept class is unknown.3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.Bio:Sanjoy Dasgupta is a Professor in the Department of Computer Science and Engineering at UC San Diego. He works on algorithms for machine learning, with a focus on unsupervised and interactive learning. 32-G882 (Stata Center - 8th Floor Hewlett Room)