TALK: Thesis Defense: Multimodal Representation Learning for Agentic AI Systems

Speaker

Alex Andonian
MIT CSAIL

Host

Aude Oliva
MIT CSAIL
Abstract:
Modern artificial intelligence (AI) is poised to transform the scientific process, from ideation and experimentation to peer review, with many researchers positing that emerging generalist AI agents will soon no longer be mere tools, but equal partners in scientific exploration. In this work, we contribute to this evolving landscape through converging lines of research focused on developing and evaluating more efficient and interpretable AI systems, spanning both vision and language domains, and their applications to scientific evaluation and review.
Our research focuses on three key areas. First, we introduce a novel framework to enhance the efficiency and robustness of cross-modal representation learning methods. Our approach utilizes progressive self-distillation and soft image-text alignments to model the many-to-many correspondences found in noisy web-harvested datasets. Extensive evaluation demonstrates that our method consistently outperforms CLIP across multiple benchmarks, including improved robustness to natural distribution shifts. Second, we extend this framework to zero-shot open vocabulary detection, introducing augmentation, architectural and self-training strategies for improving vision-text feature alignment. Evaluation on long-tail detection benchmarks demonstrates state-of-the-art performance, with competitive performance for unseen classes, as well as superior transfer to additional datasets. Finally, we present the Review Integrated Scientific Evaluation (RISE) benchmark, a novel framework for assessing AI performance in understanding, critiquing, and providing constructive feedback on scientific manuscripts. Our study compares AI-generated reviews against human expert evaluations, revealing both the promising capabilities and current limitations of AI in scientific peer review. The dissertation concludes by proposing future directions for AI-accelerated science, emphasizing the need for collaborative human-AI scientific communities and the development of evaluation methods for higher-level autonomous capabilities in scientific domains. Altogether, this work contributes to the ongoing discourse on AI's role in scientific research and paves the way for more rigorous integration of AI systems into the scientific process.
Thesis Committee: Profs. Phillip Isola and Jacob Andreas