Learning from Value Function Intervals for Contact-Aware Robot Controllers
Speaker
Robin Deits
MIT CSAIL
Host
Russ Tedrake
MIT CSAIL
Abstract:
The problem of handling contact is central to the task of controlling a walking robot. Robots can only move through the world by exerting forces on their environment, and choosing where, when, and how to touch the world is the fundamental challenge of locomotion. Because the space of possible contacts is a high-dimensional mix of discrete and continuous decisions, it has historically been difficult or impossible to make complex contact decisions online at control rates. Even offline optimization of motions with contact generally suffers from poor local minima, non-convexity, and potentially non-unique solutions.
This thesis introduces LVIS (Learned Value Interval Supervision) which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate this technique on a simplified humanoid robot model and discuss how it compares to our prior work with the Atlas humanoid robot.
The problem of handling contact is central to the task of controlling a walking robot. Robots can only move through the world by exerting forces on their environment, and choosing where, when, and how to touch the world is the fundamental challenge of locomotion. Because the space of possible contacts is a high-dimensional mix of discrete and continuous decisions, it has historically been difficult or impossible to make complex contact decisions online at control rates. Even offline optimization of motions with contact generally suffers from poor local minima, non-convexity, and potentially non-unique solutions.
This thesis introduces LVIS (Learned Value Interval Supervision) which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function (or cost-to-go) rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate this technique on a simplified humanoid robot model and discuss how it compares to our prior work with the Atlas humanoid robot.