TALK: Variable Risk Policy Search for Dynamic Robot Control
Speaker: Scott Kuindersma, University of Massachusetts Amherst
Date: Monday, August 13 2012
Time: 2:00PM to 3:00PM
Location: 32-G449 (Patil/Kiva)
Host: Russ Tedrake, MIT
Contact: Mieke Moran, 617-253-5817, email@example.com Relevant URL:
A central goal of the robotics community is to develop general optimization algorithms for producing high-performance dynamic behaviors in robot systems. This goal is challenging because many robot control tasks are characterized by significant stochasticity, high-dimensionality, expensive evaluations, and the fact that dynamic system models are often unknown or unreliable. Despite these challenges, a range of algorithms exist for performing efficient optimization of parameterized control policies with respect to average cost criteria. However, other statistics of the cost may also be important. In particular, for many stochastic control problems, it can be advantageous to select policies based not only on their average cost, but also their cost variance (or risk).
In this talk, I discuss efficient global and local risk-sensitive stochastic optimization algorithms suitable for performing policy search in variety of problems of interest to robotics researchers. These algorithms exploit new techniques in nonparameteric heteroscedastic regression to directly model the policy dependent distribution of cost. For local search, learned cost models can be used as critics for performing risk-sensitive gradient descent. Alternatively, decision-theoretic criteria can be applied to globally select policies to balance exploration and exploitation in a principled way, or to perform greedy minimization with respect to various risk-sensitive criteria. This separation of learning and policy selection leads to variable risk control, where risk sensitivity can be flexibly adjusted and appropriate policies can be selected at runtime without requiring additional policy executions.
To evaluate these algorithms and highlight the importance of risk in dynamic control tasks, I describe several experiments with the UMass uBot-5 that include learning dynamic arm motions to stabilize after large impacts, lifting heavy objects while balancing, and developing safe fall bracing behaviors. The results of these experiments suggest that the ability to select policies based on risk-sensitive criteria can lead to greater flexibility in dynamic behavior generation.
Scott Kuindersma is a PhD candidate in the Computer Science Department at the University of Massachusetts Amherst working under the supervision of Roderic Grupen and Andrew Barto. His dissertation work involves developing efficient risk-sensitive policy search algorithms and applying them to various dynamic robot control tasks. This work has lead to the development of several postural stability, recovery, and manipulation controllers for the uBot-5 humanoid mobile manipulator. Scott is supported by a NASA GSRP Fellowship from Johnson Space Center where he has also contributed to control and application development for Robonaut 2.
See other events happening in August 2012