Learning from Value Function Intervals for Contact-Aware Robot Controllers

Speaker

Robin Deits

MIT CSAIL

Host

Russ Tedrake

MIT CSAIL

Abstract:
The problem of handling contact is central to the task of controlling a walking robot. Robots can only move through the world by exerting forces on their environment, and choosing where, when, and how to touch the world is the fundamental challenge of locomotion. Because the space of possible contacts is a high-dimensional mix of discrete and continuous decisions, it has historically been difficult or impossible to make complex contact decisions online at control rates. Even offline optimization of motions with contact generally suffers from poor local minima, non-convexity, and potentially non-unique solutions.

This thesis introduces LVIS (Learned Value Interval Supervision) which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate this technique on a simplified humanoid robot model and discuss how it compares to our prior work with the Atlas humanoid robot.

Add to Calendar 2018-10-15 12:30:00 2018-10-15 13:30:00 America/New_York Learning from Value Function Intervals for Contact-Aware Robot Controllers Abstract: The problem of handling contact is central to the task of controlling a walking robot. Robots can only move through the world by exerting forces on their environment, and choosing where, when, and how to touch the world is the fundamental challenge of locomotion. Because the space of possible contacts is a high-dimensional mix of discrete and continuous decisions, it has historically been difficult or impossible to make complex contact decisions online at control rates. Even offline optimization of motions with contact generally suffers from poor local minima, non-convexity, and potentially non-unique solutions. This thesis introduces LVIS (Learned Value Interval Supervision) which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate this technique on a simplified humanoid robot model and discuss how it compares to our prior work with the Atlas humanoid robot. Seminar Room G449 (Patil/Kiva)

Organizer & Contact

Robin Deits

rdeits@csail.mit.edu

Part of

Thesis Defense

Learning from Value Function Intervals for Contact-Aware Robot Controllers

Speaker

Host

October 15 2018

Location

Organizer & Contact

Part of

December 17

PhD defense, Rickard Brüel Gabrielsson: Feature Learning for Foundation Models Across Tasks, Modalities, and Scales

December 11

Towards Interpretable and Operationalized Fairness in Machine Learning

Learning from Value Function Intervals for Contact-Aware Robot Controllers

Speaker

Host

October 15 2018

Location

Organizer & Contact

Part of

Related Events

December 17

PhD defense, Rickard Brüel Gabrielsson: Feature Learning for Foundation Models Across Tasks, Modalities, and Scales

December 11

Towards Interpretable and Operationalized Fairness in Machine Learning