Towards Better Reinforcement Learning for High Stakes Domains

Speaker

Emma Brunskill
Stanford University

Host

Tamara Broderick
MIT CSAIL
Abstract:
There is increasing excitement about reinforcement learning -- a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. One key question is how to leverage the ever expanding logs of sequences of decisions made and their outcomes to identify better decision policies, such as using electronic medical record data to inform better personalized patient treatments. I will discuss some of our work on developing better statistical estimators and optimizers for this problem, from the vantage point of a reinforcement learning researcher. A second key issue is to narrow the gap between RL theory and experiment, to create online reinforcement learning algorithms with strong guarantees on their performance. In this effort, I will briefly describe our recent work on policy certificates for RL algorithms. This work is a step towards providing the types of guarantees and transparency that will enable us to confidently deploy RL in important high stakes scenarios.

Bio
Emma Brunskill is an assistant professor in the Computer Science Department at Stanford University where she leads the AI for Human Impact (@ai4hi) group. Her work focuses on reinforcement learning in high stakes scenarios-- how can an agent learn from experience to make good decisions when experience is costly or risky, such as in educational software, healthcare decision making, robotics or people-facing applications. She was previously on faculty at Carnegie Mellon University. She is the recipient of a multiple early faculty career awards (National Science Foundation, Office of Naval Research, Microsoft Research) and her group has received several best research paper nominations (CHI, EDMx2) and awards (UAI, RLDM).