How Cognitive Probes guide development of generative AI by diagnosing intellectual deficits

Speaker

Isaac Galatzer-Levy
Google DeepMind

Host

Boris Katz
CSAIL

Users of Large Language Models increasingly observe a wide range of performance deficits, yet the core mechanistic causes of these errors remain poorly understood and characterized, hampering systematic efforts to address them and achieve artificial general intelligence (AGI). To address this challenge, Dr. Isaac Galatzer-Levy will present a novel framework for comprehensive cognitive evaluation of foundation models, applying principles from human psychometrics to characterize their abilities and deficits. This approach reveals profound performance asymmetries in leading models, such as Gemini and others: while they exhibit superhuman capabilities in verbal and working memory tasks, they show severe deficits in visual-perceptual and intuitive physics domains. These weaknesses are particularly evident in tasks requiring visual reasoning, complex puzzle-solving, and even basic perception, where models often perform at levels far below human norms. Such deficits will limit the ability to advance multimodal and robotics applications of generative AI. The research employs a wide range of testing paradigms, from one-way model evaluation on static benchmarks to interactive social agent evolution in multi-agent simulations. Dr. Galatzer-Levy will conclude by demonstrating how this detailed cognitive profiling can be applied to identify and remediate reasoning errors in AI that are analogous to human cognitive distortions, which can lead to the generation of delusions.

Speaker Bio: Dr. Galatzer-Levy holds a PhD in Clinical Psychology from Columbia University. He is on the research faculty at NYU Grossman School of Medicine in the Department of Psychiatry, where he received postdoctoral training in neuroscience and bioinformatics. He has worked across start-ups and big tech (Meta Reality Labs; Google DeepMind) on applications of psychological constructs to the development of AI models ranging from large sensor models to foundational GenAI research and development. He holds multiple patents in these areas and has over 100 peer-reviewed publications.