[ML+Crypto] Why Language Models Hallucinate

Speaker

Adam Tauman Kalai
OpenAI

Host

Shafi, Yael, Jonathan and Vinod

ML+Crypto Seminar

Title: Why Language Models Hallucinate
Speaker: Adam Tauman Kalai (OpenAI)
Time: Tuesday, November 4, 10:30–12.30pm
Location: 32-G575
Seminar series: ML+Crypto
 

Large language models (LLMs) sometimes generate statements that are plausible but factually incorrect—a phenomenon commonly called “hallucination.” We argue that these errors are not mysterious failures of architecture or reasoning, but rather predictable consequences of standard training and evaluation incentives. 

We show (i) that hallucinations can be viewed as classification errors: when pretrained models cannot reliably distinguish a false statement from a true one, they may produce the false option rather than saying I don’t know; (ii) that optimization of benchmark performance encourages guessing rather than abstaining, since most evaluation metrics penalize expressing uncertainty; and (iii) that a possible mitigation path lies in revising existing benchmarks to reward calibrated abstention, thus realigning incentives in model development. 

Joint work with Santosh Vempala (Georgia Tech) and Ofir Nachum & Edwin Zhang (OpenAI)