Embodied Intelligence 2024-2025

Back to Events

Seminar Series

October 03

Foundations of High-Modality Multisensory AI

Paul Liang

MIT Media Lab

Part Of

Embodied Intelligence 2024-2025

4:00P

- 5:00P

Location

32-G449 (Stata Center, Patil/Kiva Conference Room)

Add to Calendar 2024-10-03 16:00:00 2024-10-03 17:00:00 America/New_York Foundations of High-Modality Multisensory AI Abstract: Building multisensory AI that learns from text, speech, video, real-world sensors, wearable devices, and medical data holds promise for impact in many scientific areas with practical benefits, such as supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. However, multimodal systems quickly run into data and modeling bottlenecks: it is increasingly difficult to collect paired multimodal data and scale multimodal transformers as the number of modalities and their dimensionality grows. In this talk, I propose a vision of high-modality learning: building multimodal AI over many diverse input modalities, given only partially observed subsets of data or model representations. We will cover 2 key ideas to enable high-modality learning: (1) discovering how modalities interact to give rise to new information, and (2) tackling the heterogeneity over many different modalities. Finally, I will discuss our collaborative efforts in scaling AI to many modalities and tasks for real-world impact on affective computing, mental health, and cancer prognosis.Bio: Paul Liang is an Assistant Professor at MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning. 32-G449 (Stata Center, Patil/Kiva Conference Room)

September 26

Learning Robust, Real-world Visuomotor Skills from Generated Data

Ge Yang

MIT CSAIL

Part Of

Embodied Intelligence 2024-2025

4:00P

- 5:00P

Location

32-G449 (Stata Center, Patil-Kiva Conference Room)

Add to Calendar 2024-09-26 16:00:00 2024-09-26 17:00:00 America/New_York Learning Robust, Real-world Visuomotor Skills from Generated Data Abstract: The mainstream approach in robot learning today relies heavily on imitation learning from real-world human demonstrations. These methods are sample efficient in controlled environments and easy to scale to a large number of skills. However, I will present algorithmic arguments to explain why merely scaling up imitation learning is insufficient for advancing robotics. Instead, my talk will focus on developing performant visuomotor policies in simulation and the techniques that make them robust enough to transfer directly to real-world color observations.I will introduce LucidSim, our recent breakthrough in producing real-world perceptive robot policies from synthetic data. Using only generated images, we successfully trained a robot dog to perform parkour through obstacles at high speed, relying solely on a color camera for visual input. I will discuss how we generate diverse and physically accurate image sequences within simulated environments for learning, and address the system challenges we overcame to scale up. Finally, I will outline our push for versatility and plans to acquire three hundred language-aware visuomotor skills by the end of this year. These are the first steps toward developing fully autonomous, embodied agents that require deeper levels of intelligence.Bio: Ge Yang is a postdoctoral researcher working with Phillip Isola at MIT CSAIL. His research focuses on developing the algorithmic and system foundations for computational visuomotor learning, with an emphasis on learning from synthetic data and sim-to-real transfer. Ge's work is dedicated to making robots capable, versatile, and intelligent.Before transitioning into AI and robotics, Ge earned his Ph.D. in Physics from the University of Chicago and a Bachelor of Science in Mathematics and Physics from Yale University. His experience in physics motivated a multidisciplinary approach to problem-solving in AI. He is a recipient of the NSF Institute of AI and Fundamental Interactions Postdoc Fellowship and the Best Paper Award at the 2024 Conference on Robot Learning (CoRL), selected from 499 submissions. 32-G449 (Stata Center, Patil-Kiva Conference Room)

September 19

Cultural Biases, World Languages, and Privacy Protection in Large Language Models

Wei Xu

Georgia Institute of Technology

Part Of

Embodied Intelligence 2024-2025

4:00P

- 5:00P

Location

45-792 (Schwarzman College of Computing)

Add to Calendar 2024-09-19 16:00:00 2024-09-19 17:00:00 America/New_York Cultural Biases, World Languages, and Privacy Protection in Large Language Models Abstract: In this talk, I will highlight three key aspects of large language models: (1) cultural bias in LLMs and pre-training data, (2) decoding algorithm for low-resource languages, and (3) human-centered design for real-world applications.The first part focuses on systematically assessing LLMs' favoritism towards Western culture. We take an entity-centric approach to measure the cultural biases among LLMs (e.g., GPT-4, Aya, and mT5) through natural prompts, story generation, sentiment analysis, and named entity tasks. One interesting finding is that a potential cause of cultural biases in LLMs is the extensive use and upsampling of Wikipedia data during the pre-training of almost all LLMs. The second part will introduce a constrained decoding algorithm that can facilitate the generation of high-quality synthetic training data for fine-grained prediction tasks (e.g., named entity recognition, event extraction). This approach outperforms GPT-4 on many non-English languages, particularly low-resource African languages. Lastly, I will showcase an LLM-powered privacy preservation tool designed to safeguard users against the disclosure of personal information. I will share findings from an HCI user study that involves real Reddit users utilizing our tool, which in turn informs our ongoing efforts to improve the design of AI models.Concluding the talk, I will briefly touch upon recent research exploring the temporal robustness of large language models (e.g., handling neologisms) and advances in human-AI interactive evaluation of LLM-generated texts.Bio: Wei Xu is an Associate Professor in the College of Computing and Machine Learning Center at the Georgia Institute of Technology, where she is the director of the NLP X Lab. Her research interests are in natural language processing and machine learning, with a focus on Generative AI, robustness and fairness of large language models, multilingual LLMs, as well as interdisciplinary research in AI for science, education, accessibility, and privacy. She is a recipient of the NSF CAREER Award, AI for Everyone Award, Best Paper Award and Honorable Mention at COLING'18, ACL’23. She also received research funds from DARPA and IARPA. She is currently an executive board member of NAACL. 45-792 (Schwarzman College of Computing)