Embodied Intelligence (EI) Joint Seminar Presentation

Speaker

Xiaolin Fang & Aditya Agarwal
MIT CSAIL

Host

Tomás Lozano-Pérez
MIT CSAIL

There will be a joint presentation this week by two MIT CSAIL PhD candidates working with Profs. Leslie Kaelbling and Tomás Lozano-Pérez.

Title:  KALM: Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

Presenter:  Xiaolin Fang

Abstract: 
How can robots pick up skills that generalize across diverse objects and environments with few examples, so that we can scale up robot learning to a wide range of tasks? In this talk, I'll introduce KALM, a framework that uses pre-trained vision-language models to automatically generate task-relevant and consistent keypoints, which can be used to guide a diffusion action model. KALM enables robots to learn keypoint-conditioned policies that generalize across object poses, camera views, and new instances with only 5 to 10 demonstrations. I’ll discuss key findings and real-world results that demonstrate how this approach can lead to more scalable and generalizable robot learning.

Bio:
Xiaolin Fang is a PhD candidate at MIT CSAIL, working with Leslie Kaelbling and Tomás Lozano-Pérez on robot learning and planning. With an emphasis on generalization to new goals and novel environments, especially for long-horizon manipulation tasks, she explores approaches that integrate foundation models, generative learning, and structured planning to develop generalizable robot skills. Previously, she has spent time working with Prof. Dieter Fox at NVIDIA Robotics Lab. 

Title:  How Robots Learn to See — Building open-world 3D scene representations for Robot Perception

Presenter:  Aditya Agarwal 

Abstract
Robots operating in unstructured, everyday environments must be able to perceive and reason about the world beyond what they can directly observe. Achieving robust manipulation in such settings require a rich, complete understanding of complex 3D scenes from sparse and noisy sensory inputs. In this talk, I will introduce SceneComplete, a system that composes general-purpose foundation models for open-world 3D scene completion to support reliable grasping and manipulation. I will highlight the challenges in building such a system for real-world environments and discuss future directions towards general-purpose robotic perception in open-world settings. 

Bio
Aditya Agarwal is a 2nd year PhD candidate at MIT CSAIL, working with professors Leslie Kaelbling and Tomás Lozano-Pérez on robot learning and perception. His research aims to bridge the gap between perception and robot learning to enable general-purpose robot manipulation by integrating foundation models, generative modelling, and representation learning. He has completed his master's from IIIT Hyderabad and has spent time at Mila Institute Canada and Microsoft Research Labs India.