[Thesis Defense] Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning
Abstract:
Advancing robotic manipulation to achieve generalization across diverse goals, environments, and embodiments is a critical challenge in robotics research. While the availability of data and large-scale training has brought exciting progress in robotics manipulation, current methods often struggle with generalizing to unseen, unstructured environments and solving long-horizon tasks. In this thesis, I will present my contributions that bridge structured decision-making frameworks with learned perceptual and policy components to enable multi-step manipulation in partially observable environments. Specifically, I will talk about my work in 1) constructing a modular framework that estimates affordances using learned perceptual models with task and motion planning (TAMP) for object rearrangement in unstructured scenes, 2) learning generative diffusion models of robot skills, which can be composed to solve unseen combination of environmental constraints through infeference-time optimization, 3) leveraging large vision-language models (VLMs) in building task-oriented visual abstractions, allowing skills to generalize across different environments with only 5 to 10 demonstrations. Together, these approaches contribute to the generality and scalability of embodied agents towards solving real-world manipulation in unstructured environments.
Thesis Committee: Leslie Kaelbling, Tomás Lozano-Pérez, Russ Tedrake