February 15 '16

Risk, reward and robots

“The problem with planning contingencies is that there are so many things that can go wrong, if you generated plans for all possible contingencies, you’d go nuts,” Brian Williams says.

Planning algorithms are widely used in logistics and control. They can help schedule flights and bus routes, guide autonomous robots, and determine control policies for the power grid, among other things.

In recent years, planning algorithms have begun to factor in uncertainty — variations in travel time, erratic communication between autonomous robots, imperfect sensor data, and the like. That causes the scale of the planning problem to grow exponentially, but researchers have found clever ways to solve it efficiently.

Now, researchers at CSAIL and the Australian National University (ANU) have made the problem even more complex, by developing a planning algorithm that also generates contingency plans, should the initial plan prove too risky. It also identifies the conditions — say, sensor readings or delays incurred — that should trigger a switch to a particular contingency plan.

Despite the extra labor imposed by generating contingency plans, the algorithm still provides mathematical guarantees that its plans’ risk of failure falls below some threshold, which the user sets.

“The problem with planning contingencies is that there are so many things that can go wrong, if you generated plans for all possible contingencies, you’d go nuts,” says Brian Williams, a professor of aeronautics and astronautics whose group at CSAIL developed the new system. “So then the question ends up being, ‘How many contingencies do you generate?’”

Pedro Santana, a graduate student in aeronautics and astronautics, is first author on a paper describing the system, which he presented at the annual meeting of the Association for the Advancement of Artificial Intelligence last weekend. He’s joined by Williams and Sylvie Thiébaux, a professor of computer science at the Australian National University and a researcher with Australia’s National Information Communications Technology Australia (NICTA) research program, which has a partnership with MIT.

Probabilistic pruning

As Williams explains, the range of possible decisions that a planner faces can be represented as a data structure called a graph. A graph consists of nodes, which are usually represented as circles, and edges, which are represented as line segments connecting the nodes. Network diagrams and flow charts are familiar examples of a graph.

In a planning system, each node of the graph represents a decision point, such as, “Should I take the bus or the subway?” A path through the graph can be evaluated according to the rewards it offers — you reach your destination safely — and the penalties it imposes — you’ll be five minutes late. The optimal plan is the one that maximizes reward.

Factoring in probabilities makes that type of reward calculation much more complex: The average bus trip might be 15 minutes, but there’s some chance that it will be 35; the average subway trip might be 18 minutes, but it’s almost never more than 24. In that context, for even a relatively simple planning task, canvassing contingency plans can be prohibitively time consuming.

To make the problem tractable, the MIT and ANU researchers borrowed a technique from some earlier work from Williams’ group. Before the planner begins constructing the graph, it asks the user to set risk thresholds. A researcher trying to develop a data-gathering plan for a multimillion-dollar underwater robot, for example, might be satisfied with a 90 percent probability that the robot will take all the sensor readings it’s supposed to — but they might want a 99.9 percent probability that the robot won’t collide with a rock face at high speed.

The researchers’ algorithm treats these thresholds as a “risk budget” that it spends as it explores paths through the graph. If the planner determines that a given branch of the graph will exceed the budget, it lops it off.

Staying optimistic

That determination has to be rapid, however. So the researchers use some simple rules of thumb — or, in computer parlance, heuristics — to evaluate branches. Every path through a given branch, for instance, might include a different car route between two points, each with its own probability distribution of possible travel times. But if traversing a straight line between the points, at the maximum allowed speed, will still incur intolerable delays, there’s no point in evaluating probabilities for every route. The branch can be eliminated.

As long as the heuristics are optimistic — possibly underestimating risk but never overestimating it — the planner can lop off branches without compromising the quality of the final plans. Sometimes those heuristics are application-specific, like the one that evaluates routes geometrically. But sometimes they’re not.

For instance, one of the reasons that probability distributions add so much complexity to planning calculations is that they’re nonlinear: Their graphs take on shapes that are more complicated than simple lines. In a paper being presented at the International Conference on Automated Planning and Scheduling in June, Santana — again first author — Williams, and colleagues at MIT, the University of São Paulo in Brazil, and Caltech describe a way to produce linear approximations of probability distributions that are much easier to work with mathematically.

Those approximations are optimistic: They never overestimate risk. But a computer can evaluate them thousands of times faster than it can a nonlinear distribution. Such heuristics offer hope that the researchers’ planning system could update plans on the fly, in light of new information, as well as generating contingency plans in advance.

"The authors have tackled a very important problem that is highly relevant to NASA missions, where we are increasingly reliant on autonomous behaviors programmed into our spacecraft,” says Michel Ingham, technical group supervisor for the System Architectures and Behaviors group at NASA’s Jet Propulsion Laboratory.

“Their work is especially relevant for in-situ missions on the surface of asteroids and comets, planetary bodies like Mars, or not-too-distant-future missions to outer-planet icy moons like Europa and Enceladus. Our robotic spacecraft exploring these locales will be challenged to execute their missions efficiently in the presence of significant uncertainty in both their own state and the state of the surrounding environment, while avoiding obstacles and hazards. They will need to appropriately balance accomplishing their scientific and exploration objectives with keeping themselves healthy and out of trouble,” he added.

“The authors have provided a method and capability for planning a robot’s activities, which optimizes the reward associated with accomplishing mission objectives while bounding the ‘execution risk’ taken by the mission. This means that our future mission operators could be able to specify how conservative or aggressive they want the spacecraft to be by choosing an appropriate risk bound and have some guarantee that the spacecraft will not exceed this level of risk."