Planning from a Few Samples

Speaker

Geoff Gordon
Microsoft Research Montreal

Host

Stefanie Jegelka
MIT CSAIL
Abstract:
We study the problem of planning when experience is expensive: we are given a few trajectories sampled from our target environment, and we wish to compute a good policy or value function. In such problems, we can’t afford to use “deep RL” methods like DQN or REINFORCE: these methods can need millions or even billions of samples to reach a useful solution. On the other hand, we are willing to assume that we can gather a representative sample of trajectories; e.g., we might have access to an expert who can steer the system into interesting regions of state space. Given these problem characteristics, we seek an algorithm that is reliable, fast, and data-efficient. To this end, we design a new approximate optimality criterion for planning: we frame the problem as a fixed-point iteration based on a variational inequality, and use projection and subsampling to reduce the problem to a tractable size. We demonstrate that our approach can solve simple planning problems quickly and reliably from moderate amounts of data.

Bio:
Dr. Gordon is Research Director of Microsoft Research Montreal. He is on leave from the Department of Machine Learning at Carnegie Mellon University, where he has served as Professor, Interim Department Head, and Associate Department Head for Education. His research interests include artificial intelligence, statistical machine learning, game theory, multi-robot systems, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.