Foundations of High-Modality Multisensory AI
MIT Media Lab
Add to Calendar
2024-10-03 16:00:00
2024-10-03 17:00:00
America/New_York
Foundations of High-Modality Multisensory AI
Abstract: Building multisensory AI that learns from text, speech, video, real-world sensors, wearable devices, and medical data holds promise for impact in many scientific areas with practical benefits, such as supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. However, multimodal systems quickly run into data and modeling bottlenecks: it is increasingly difficult to collect paired multimodal data and scale multimodal transformers as the number of modalities and their dimensionality grows. In this talk, I propose a vision of high-modality learning: building multimodal AI over many diverse input modalities, given only partially observed subsets of data or model representations. We will cover 2 key ideas to enable high-modality learning: (1) discovering how modalities interact to give rise to new information, and (2) tackling the heterogeneity over many different modalities. Finally, I will discuss our collaborative efforts in scaling AI to many modalities and tasks for real-world impact on affective computing, mental health, and cancer prognosis.Bio: Paul Liang is an Assistant Professor at MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.
32-G449 (Stata Center, Patil/Kiva Conference Room)