Learning to Reason with LLMs

Speaker

Noam Brown

OpenAI

Host

Jacob Andreas

CSAIL

Large language models (LLMs) have demonstrated remarkable capabilities in generating coherent text and completing various natural language tasks. Nevertheless, their ability to perform complex, general reasoning has remained limited. In this talk, I will describe OpenAI's new o1 model, an LLM trained via reinforcement learning to generate a hidden chain of thought before its response. We have found that the performance of o1 consistently improves with more reinforcement learning compute and with more inference compute. o1 surpasses previous state-of-the-art models in a variety of benchmarks that require reasoning, including mathematics competitions, programming contests, and advanced science question sets. I will discuss the implications of scaling this paradigm even further.

Add to Calendar 2025-01-30 14:00:00 2025-01-30 15:30:00 America/New_York Learning to Reason with LLMs Large language models (LLMs) have demonstrated remarkable capabilities in generating coherent text and completing various natural language tasks. Nevertheless, their ability to perform complex, general reasoning has remained limited. In this talk, I will describe OpenAI's new o1 model, an LLM trained via reinforcement learning to generate a hidden chain of thought before its response. We have found that the performance of o1 consistently improves with more reinforcement learning compute and with more inference compute. o1 surpasses previous state-of-the-art models in a variety of benchmarks that require reasoning, including mathematics competitions, programming contests, and advanced science question sets. I will discuss the implications of scaling this paradigm even further. Seminar Room G449 (Patil/Kiva)

Organizer & Contact

Jacob Andreas

jda@csail.mit.edu

Learning to Reason with LLMs

Speaker

Host

January 30 2025

Location

Organizer & Contact