TALK: Thesis Defense: Multimodal Representation Learning for Agentic AI Systems

Speaker

Alex Andonian

MIT CSAIL

Host

Aude Oliva

MIT CSAIL

Abstract:
Modern artificial intelligence (AI) is poised to transform the scientific process, from ideation and experimentation to peer review, with many researchers positing that emerging generalist AI agents will soon no longer be mere tools, but equal partners in scientific exploration. In this work, we contribute to this evolving landscape through converging lines of research focused on developing and evaluating more efficient and interpretable AI systems, spanning both vision and language domains, and their applications to scientific evaluation and review.
Our research focuses on three key areas. First, we introduce a novel framework to enhance the efficiency and robustness of cross-modal representation learning methods. Our approach utilizes progressive self-distillation and soft image-text alignments to model the many-to-many correspondences found in noisy web-harvested datasets. Extensive evaluation demonstrates that our method consistently outperforms CLIP across multiple benchmarks, including improved robustness to natural distribution shifts. Second, we extend this framework to zero-shot open vocabulary detection, introducing augmentation, architectural and self-training strategies for improving vision-text feature alignment. Evaluation on long-tail detection benchmarks demonstrates state-of-the-art performance, with competitive performance for unseen classes, as well as superior transfer to additional datasets. Finally, we present the Review Integrated Scientific Evaluation (RISE) benchmark, a novel framework for assessing AI performance in understanding, critiquing, and providing constructive feedback on scientific manuscripts. Our study compares AI-generated reviews against human expert evaluations, revealing both the promising capabilities and current limitations of AI in scientific peer review. The dissertation concludes by proposing future directions for AI-accelerated science, emphasizing the need for collaborative human-AI scientific communities and the development of evaluation methods for higher-level autonomous capabilities in scientific domains. Altogether, this work contributes to the ongoing discourse on AI's role in scientific research and paves the way for more rigorous integration of AI systems into the scientific process.
Thesis Committee: Profs. Phillip Isola and Jacob Andreas

Add to Calendar 2024-08-15 14:00:00 2024-08-15 15:00:00 America/New_York TALK: Thesis Defense: Multimodal Representation Learning for Agentic AI Systems Abstract:Modern artificial intelligence (AI) is poised to transform the scientific process, from ideation and experimentation to peer review, with many researchers positing that emerging generalist AI agents will soon no longer be mere tools, but equal partners in scientific exploration. In this work, we contribute to this evolving landscape through converging lines of research focused on developing and evaluating more efficient and interpretable AI systems, spanning both vision and language domains, and their applications to scientific evaluation and review.Our research focuses on three key areas. First, we introduce a novel framework to enhance the efficiency and robustness of cross-modal representation learning methods. Our approach utilizes progressive self-distillation and soft image-text alignments to model the many-to-many correspondences found in noisy web-harvested datasets. Extensive evaluation demonstrates that our method consistently outperforms CLIP across multiple benchmarks, including improved robustness to natural distribution shifts. Second, we extend this framework to zero-shot open vocabulary detection, introducing augmentation, architectural and self-training strategies for improving vision-text feature alignment. Evaluation on long-tail detection benchmarks demonstrates state-of-the-art performance, with competitive performance for unseen classes, as well as superior transfer to additional datasets. Finally, we present the Review Integrated Scientific Evaluation (RISE) benchmark, a novel framework for assessing AI performance in understanding, critiquing, and providing constructive feedback on scientific manuscripts. Our study compares AI-generated reviews against human expert evaluations, revealing both the promising capabilities and current limitations of AI in scientific peer review. The dissertation concludes by proposing future directions for AI-accelerated science, emphasizing the need for collaborative human-AI scientific communities and the development of evaluation methods for higher-level autonomous capabilities in scientific domains. Altogether, this work contributes to the ongoing discourse on AI's role in scientific research and paves the way for more rigorous integration of AI systems into the scientific process.Thesis Committee: Profs. Phillip Isola and Jacob Andreas D463 (Star)

Organizer & Contact

Alex Andonian

andonian@mit.edu

TALK: Thesis Defense: Multimodal Representation Learning for Agentic AI Systems

Speaker

Host

August 15 2024

Location

Organizer & Contact