Learning a Driving Model from Imperfect Demonstrations

Speaker

Huazhe (Harry) Xu

UC Berkeley

Abstract:

Robust real-world learning should benefit from both demonstrations and interaction with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on reward from the environment. These tasks have divergent losses which are difficult to jointly optimize; further, such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstration and refines the policy in a real environment. Crucially, both learning from demonstration and interactive refinement use exactly the same objective, unlike prior approaches that combine distinct supervised and reinforcement losses. This makes NAC robust to suboptimal demonstration data, since the method is not forced to mimic all of the examples in the dataset. We show that our unified reinforcement learning algorithm can learn robustly and outperform existing baselines when evaluated on several realistic driving games.

Bio:
Huazhe (Harry) Xu is a Ph.D. student under Prof. Trevor Darrell in Berkeley Artificial Intelligence Research Lab (BAIR) at University of California, Berkeley (UC Berkeley). He received the B.S.E. degree in Electrical Engineering from Tsinghua University in 2016. His research focuses on computer vision, reinforcement learning and their applications such as autonomous driving.

Add to Calendar 2017-10-31 11:30:00 2017-10-31 12:30:00 America/New_York Learning a Driving Model from Imperfect Demonstrations Abstract:Robust real-world learning should benefit from both demonstrations and interaction with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on reward from the environment. These tasks have divergent losses which are difficult to jointly optimize; further, such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstration and refines the policy in a real environment. Crucially, both learning from demonstration and interactive refinement use exactly the same objective, unlike prior approaches that combine distinct supervised and reinforcement losses. This makes NAC robust to suboptimal demonstration data, since the method is not forced to mimic all of the examples in the dataset. We show that our unified reinforcement learning algorithm can learn robustly and outperform existing baselines when evaluated on several realistic driving games.Bio: Huazhe (Harry) Xu is a Ph.D. student under Prof. Trevor Darrell in Berkeley Artificial Intelligence Research Lab (BAIR) at University of California, Berkeley (UC Berkeley). He received the B.S.E. degree in Electrical Engineering from Tsinghua University in 2016. His research focuses on computer vision, reinforcement learning and their applications such as autonomous driving. 32-D507

Organizer & Contact

Xiuming Zhang

xiuming@csail.mit.edu

Part of

Vision Seminar Series 2017

Learning a Driving Model from Imperfect Demonstrations

Speaker

October 31 2017

Location

Organizer & Contact

Part of

February 21

Attention and Activities in First Person Vision

April 13

The lifetime of an object - an object’s perspective onto interactions

Learning a Driving Model from Imperfect Demonstrations

Speaker

October 31 2017

Location

Organizer & Contact

Part of

Related Events

February 21

Attention and Activities in First Person Vision

April 13

The lifetime of an object - an object’s perspective onto interactions