Foundations of High-Modality Multisensory AI

Speaker

MIT Media Lab

Host

Jim Glass
MIT CSAIL
Abstract:
Building multisensory AI that learns from text, speech, video, real-world sensors, wearable devices, and medical data holds promise for impact in many scientific areas with practical benefits, such as supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. However, multimodal systems quickly run into data and modeling bottlenecks: it is increasingly difficult to collect paired multimodal data and scale multimodal transformers as the number of modalities and their dimensionality grows. In this talk, I propose a vision of high-modality learning: building multimodal AI over many diverse input modalities, given only partially observed subsets of data or model representations. We will cover 2 key ideas to enable high-modality learning: (1) discovering how modalities interact to give rise to new information, and (2) tackling the heterogeneity over many different modalities. Finally, I will discuss our collaborative efforts in scaling AI to many modalities and tasks for real-world impact on affective computing, mental health, and cancer prognosis.

Bio:
Paul Liang is an Assistant Professor at MIT Media Lab and MIT EECS. His research advances the foundations of multisensory artificial intelligence to enhance the human experience. He is a recipient of the Siebel Scholars Award, Waibel Presidential Fellowship, Facebook PhD Fellowship, Center for ML and Health Fellowship, Rising Stars in Data Science, and 3 best paper awards. Outside of research, he received the Alan J. Perlis Graduate Student Teaching Award for developing new courses on multimodal machine learning.