Planning from a Few Samples

Speaker

Geoff Gordon

Microsoft Research Montreal

Host

Stefanie Jegelka

MIT CSAIL

Abstract:
We study the problem of planning when experience is expensive: we are given a few trajectories sampled from our target environment, and we wish to compute a good policy or value function. In such problems, we can’t afford to use “deep RL” methods like DQN or REINFORCE: these methods can need millions or even billions of samples to reach a useful solution. On the other hand, we are willing to assume that we can gather a representative sample of trajectories; e.g., we might have access to an expert who can steer the system into interesting regions of state space. Given these problem characteristics, we seek an algorithm that is reliable, fast, and data-efficient. To this end, we design a new approximate optimality criterion for planning: we frame the problem as a fixed-point iteration based on a variational inequality, and use projection and subsampling to reduce the problem to a tractable size. We demonstrate that our approach can solve simple planning problems quickly and reliably from moderate amounts of data.

Bio:
Dr. Gordon is Research Director of Microsoft Research Montreal. He is on leave from the Department of Machine Learning at Carnegie Mellon University, where he has served as Professor, Interim Department Head, and Associate Department Head for Education. His research interests include artificial intelligence, statistical machine learning, game theory, multi-robot systems, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999.

Add to Calendar 2019-03-20 15:00:00 2019-03-20 16:00:00 America/New_York Planning from a Few Samples Abstract: We study the problem of planning when experience is expensive: we are given a few trajectories sampled from our target environment, and we wish to compute a good policy or value function. In such problems, we can’t afford to use “deep RL” methods like DQN or REINFORCE: these methods can need millions or even billions of samples to reach a useful solution. On the other hand, we are willing to assume that we can gather a representative sample of trajectories; e.g., we might have access to an expert who can steer the system into interesting regions of state space. Given these problem characteristics, we seek an algorithm that is reliable, fast, and data-efficient. To this end, we design a new approximate optimality criterion for planning: we frame the problem as a fixed-point iteration based on a variational inequality, and use projection and subsampling to reduce the problem to a tractable size. We demonstrate that our approach can solve simple planning problems quickly and reliably from moderate amounts of data.Bio: Dr. Gordon is Research Director of Microsoft Research Montreal. He is on leave from the Department of Machine Learning at Carnegie Mellon University, where he has served as Professor, Interim Department Head, and Associate Department Head for Education. His research interests include artificial intelligence, statistical machine learning, game theory, multi-robot systems, and planning in probabilistic, adversarial, and general-sum domains. His previous appointments include Visiting Professor at the Stanford Computer Science Department and Principal Scientist at Burning Glass Technologies in San Diego. Dr. Gordon received his B.A. in Computer Science from Cornell University in 1991, and his Ph.D. in Computer Science from Carnegie Mellon University in 1999. 32-D463 (Stata Center - Star Conference Room)

Organizer & Contact

Marcia G. Davidson

marcia@csail.mit.edu

617-253-3049

Part of

Machine Learning Seminar Series 2019

Planning from a Few Samples

Speaker

Host

March 20 2019

Location

Organizer & Contact

Part of

December 05

The Non-Stochastic Control Problem

October 29

David Spiegelhalter: Communicating uncertainty about facts, numbers and science

Planning from a Few Samples

Speaker

Host

March 20 2019

Location

Organizer & Contact

Part of

Related Events

December 05

The Non-Stochastic Control Problem

October 29

David Spiegelhalter: Communicating uncertainty about facts, numbers and science