Sample-Efficient Reinforcement Learning with Rich Observations

Speaker

Alekh Agarwal

Microsoft Research

Host

Stefanie Jegelka

MIT CSAIL

This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity and is small in several practical settings. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.

This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.

Alekh Agarwal is a researcher in the New York lab of Microsoft Research, prior to which he obtained his PhD from UC Berkeley. Alekh’s research currently focuses on topics in interactive machine learning, including contextual bandits, reinforcement learning and online learning. Previously, he has worked on several topics in optimization including stochastic and distributed optimization. He has won several awards for his research including the NIPS 2015 best paper award.

Add to Calendar 2017-09-20 16:00:00 2017-09-20 17:00:00 America/New_York Sample-Efficient Reinforcement Learning with Rich Observations This talk considers a core question in reinforcement learning (RL): How can we tractably solve sequential decision making problems where the learning agent receives rich observations? We begin with a new model called Contextual Decision Processes (CDPs) for studying such problems, and show that it encompasses several prior setups to study RL such as MDPs and POMDPs. Several special cases of CDPs are, however, known to be provably intractable in their sample complexities. To overcome this challenge, we further propose a structural property of such processes, called the Bellman Rank. We find that the Bellman Rank of a CDP (and an associated class of functions) provides an intuitive measure of the hardness of a problem in terms of sample complexity and is small in several practical settings. In particular, we propose an algorithm, whose sample complexity scales with the Bellman Rank of the process, and is completely independent of the size of the observation space of the agent. We also show that our techniques are robust to our modeling assumptions, and make connections to several known results as well as highlight novel consequences of our results.This talk is based on joint work with Nan Jiang, Akshay Krishnamurthy, John Langford and Rob Schapire.Alekh Agarwal is a researcher in the New York lab of Microsoft Research, prior to which he obtained his PhD from UC Berkeley. Alekh’s research currently focuses on topics in interactive machine learning, including contextual bandits, reinforcement learning and online learning. Previously, he has worked on several topics in optimization including stochastic and distributed optimization. He has won several awards for his research including the NIPS 2015 best paper award. 32-G882 (Stata Center - Hewlett Room)

Organizer & Contact

Marcia G. Davidson

marcia@csail.mit.edu

617-253-3049

Part of

Machine Learning Seminar Series

Sample-Efficient Reinforcement Learning with Rich Observations

Speaker

Host

September 20 2017

Location

Organizer & Contact

Part of

September 18

Libratus: Beating Top Humans in No-Limit Poker

October 11

Tutorial on Deep Learning with Apache MXNet Gluon

Sample-Efficient Reinforcement Learning with Rich Observations

Speaker

Host

September 20 2017

Location

Organizer & Contact

Part of

Related Events

September 18

Libratus: Beating Top Humans in No-Limit Poker

October 11

Tutorial on Deep Learning with Apache MXNet Gluon