Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean

Speaker

MIT CSAIL

Host

MIT CSAIL

RSVP here: https://forms.gle/thXsAVZkNUy6Bgwt6

Reinforcement learning (RL) has recently emerged as a powerful driver of progress in formal theorem proving, enabling near–IMO-level performance. However, it also encouraged models to produce long, redundant, and opaque proofs. In this talk, I will present two complementary ways RL can enhance automated reasoning in Lean. First, I will introduce ProofOptimizer, which applies RL to fine-tune a model for optimizing these overcomplicated proofs. Combining symbolic linting, a 7B model trained with expert iteration, and iterative refinement, ProofOptimizer reduces proof length by up to 87% on MiniF2F and 57% on PutnamBench. Second, I will show how coupling online RL with hierarchical (lemma-based) inference yields more scalable theorem proving. RL successfully trains proof decomposition policies, while hierarchical inference improves exploration and data efficiency. Together, we illustrate how RL can be applied to both improve existing proofs and prove new theorems more effectively.

Add to Calendar 2025-11-05 18:00:00 2025-11-05 19:00:00 America/New_York Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean RSVP here: https://forms.gle/thXsAVZkNUy6Bgwt6 Reinforcement learning (RL) has recently emerged as a powerful driver of progress in formal theorem proving, enabling near–IMO-level performance. However, it also encouraged models to produce long, redundant, and opaque proofs. In this talk, I will present two complementary ways RL can enhance automated reasoning in Lean. First, I will introduce ProofOptimizer, which applies RL to fine-tune a model for optimizing these overcomplicated proofs.  Combining symbolic linting, a 7B model trained with expert iteration, and iterative refinement, ProofOptimizer reduces proof length by up to 87% on MiniF2F and 57% on PutnamBench. Second, I will show how coupling online RL with hierarchical (lemma-based) inference yields more scalable theorem proving. RL successfully trains proof decomposition policies, while hierarchical inference improves exploration and data efficiency. Together, we illustrate how RL can be applied to both improve existing proofs and prove new theorems more effectively. TBD

Organizer & Contact

Zhening Li

zli11010@mit.edu

Part of

AI@MIT Reading Group

Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean

Speaker

Host

November 05 2025

Location

Organizer & Contact

Part of

October 15

H-Nets: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

October 08

Output Supervision Can Obfuscate the Chain of Thought

Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean

Speaker

Host

November 05 2025

Location

Organizer & Contact

Part of

Related Events

October 15

H-Nets: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

October 08

Output Supervision Can Obfuscate the Chain of Thought