The Matrix: A Bayesian Learning Model for LLMs

Speaker

Vishal Misra, Columbia University

Host

Hari Balakrishnan
Abstract: In this talk, we present a Bayesian learning model to analyze the behavior of Large Language Models (LLMs), focusing on their next-token prediction optimization metric. We introduce a novel approach based on a generative text model, utilizing a multinomial transition probability matrix with a prior, to understand how LLMs approximate this framework. We explore the relationship between embeddings, multinomial distributions, and the use of the Dirichlet approximation theorem for prior approximation. Additionally, we discuss how LLMs' text generation aligns with Bayesian principles and examine the role of in-context learning, particularly in larger models where prompts are treated as updatable samples. Our findings shed light on the Bayesian nature of LLMs and their implications for in-context learning and potential applications.

Bio: Vishal Misra is a Professor in the Department of Computer Science and Electrical Engineering at Columbia University and the Vice Dean for Computing and AI in the School of Engineering. He is an ACM and IEEE Fellow and his research emphasis is on mathematical modeling of systems, bridging the gap between practice and analysis. As a graduate student, he co-founded CricInfo, which was acquired by ESPN in 2007. In 2021 he developed one of the world’s first commercial applications built on top of gpt3 for ESPNCricinfo, and has been subsequently modeling the behavior of LLMs. He also played an active part in the Net Neutrality regulation process in India, where his definition of Net Neutrality was adopted both by the citizen's movement as well as the regulators. He has been awarded a Distinguished Alumnus Award by IIT Bombay (2019) and a Distinguished Young Alumnus Award by UMass-Amherst College of Engineering (2014).