[Thesis Defense] Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning

Speaker

MIT CSAIL

Abstract:
Advancing robotic manipulation to achieve generalization across diverse goals, environments, and embodiments is a critical challenge in robotics research. While the availability of data and large-scale training has brought exciting progress in robotics manipulation, current methods often struggle with generalizing to unseen, unstructured environments and solving long-horizon tasks. In this thesis, I will present my contributions that bridge structured decision-making frameworks with learned perceptual and policy components to enable multi-step manipulation in partially observable environments. Specifically, I will talk about my work in 1) constructing a modular framework that estimates affordances using learned perceptual models with task and motion planning (TAMP) for object rearrangement in unstructured scenes, 2) learning generative diffusion models of robot skills, which can be composed to solve unseen combination of environmental constraints through infeference-time optimization, 3) leveraging large vision-language models (VLMs) in building task-oriented visual abstractions, allowing skills to generalize across different environments with only 5 to 10 demonstrations. Together, these approaches contribute to the generality and scalability of embodied agents towards solving real-world manipulation in unstructured environments.

Thesis Committee: Leslie Kaelbling, Tomás Lozano-Pérez, Russ Tedrake

Add to Calendar 2025-05-29 10:00:00 2025-05-29 12:00:00 America/New_York [Thesis Defense] Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning Abstract:Advancing robotic manipulation to achieve generalization across diverse goals, environments, and embodiments is a critical challenge in robotics research. While the availability of data and large-scale training has brought exciting progress in robotics manipulation, current methods often struggle with generalizing to unseen, unstructured environments and solving long-horizon tasks. In this thesis, I will present my contributions that bridge structured decision-making frameworks with learned perceptual and policy components to enable multi-step manipulation in partially observable environments. Specifically, I will talk about my work in 1) constructing a modular framework that estimates affordances using learned perceptual models with task and motion planning (TAMP) for object rearrangement in unstructured scenes, 2) learning generative diffusion models of robot skills, which can be composed to solve unseen combination of environmental constraints through infeference-time optimization, 3) leveraging large vision-language models (VLMs) in building task-oriented visual abstractions, allowing skills to generalize across different environments with only 5 to 10 demonstrations. Together, these approaches contribute to the generality and scalability of embodied agents towards solving real-world manipulation in unstructured environments.Thesis Committee: Leslie Kaelbling, Tomás Lozano-Pérez, Russ Tedrake TBD

Part of

Thesis Defense

[Thesis Defense] Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning

Speaker

May 29 2025

Location

Part of

June 02

[Thesis Defense] Personalizing Robot Assistance under Uncertainty about the Human

May 12

Thesis Defense: Scaling Cooperative Intelligence via Inverse Planning and Probabilistic Programming

[Thesis Defense] Generalizable Robot Manipulation through Unified Perception, Policy Learning, and Planning

Speaker

May 29 2025

Location

Part of

Related Events

June 02

[Thesis Defense] Personalizing Robot Assistance under Uncertainty about the Human

May 12

Thesis Defense: Scaling Cooperative Intelligence via Inverse Planning and Probabilistic Programming