AI@MIT Reading Group

Back to Events

Seminar Series

November 19

Integrating Formal and Informal Reasoning

Chase Norman

Carnegie Mellon University

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-11-19 18:00:00 2025-11-19 19:00:00 America/New_York Integrating Formal and Informal Reasoning RSVP here: https://forms.gle/b4hXBEVJuPtB3z157  ML models for automated theorem proving are usually trained to output entire proofs as text in the Lean proof assistant. With a little ingenuity, we can devise a better interface that is significantly easier, plays to the strengths of ML, and reveals the structure of mathematics. In this talk, I will share my learnings from a few years of investigation into building future-proof infrastructure for automated theorem proving, as it relates to learning-based methods. TBD

November 05

Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean

Alex Gu

MIT CSAIL

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-11-05 18:00:00 2025-11-05 19:00:00 America/New_York Improving and Proving: Reinforcement Learning for Proof Shortening and Hierarchical Theorem Proving in Lean RSVP here: https://forms.gle/thXsAVZkNUy6Bgwt6 Reinforcement learning (RL) has recently emerged as a powerful driver of progress in formal theorem proving, enabling near–IMO-level performance. However, it also encouraged models to produce long, redundant, and opaque proofs. In this talk, I will present two complementary ways RL can enhance automated reasoning in Lean. First, I will introduce ProofOptimizer, which applies RL to fine-tune a model for optimizing these overcomplicated proofs.  Combining symbolic linting, a 7B model trained with expert iteration, and iterative refinement, ProofOptimizer reduces proof length by up to 87% on MiniF2F and 57% on PutnamBench. Second, I will show how coupling online RL with hierarchical (lemma-based) inference yields more scalable theorem proving. RL successfully trains proof decomposition policies, while hierarchical inference improves exploration and data efficiency. Together, we illustrate how RL can be applied to both improve existing proofs and prove new theorems more effectively. TBD

October 15

H-Nets: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Brandon Wang

Cartesia AI (https://cartesia.ai/)

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-10-15 18:00:00 2025-10-15 19:00:00 America/New_York H-Nets: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling RSVP here: https://forms.gle/Es9S5VtgiGA8fvkd9  Dynamic chunking (arXiv:2507.07955) is a recently introduced method to train end-to-end hierarchical sequence models (H-Nets). H-Nets model sequences while explicitly chunking them into higher-order concepts, thus allowing them to train on raw UTF-8 bytes without the need of tokenization, and also allowing them to learn sparser language representations than subword tokens. On language modeling tasks, H-Nets show superior performance to a standard tokenized transformer baseline, while exhibiting similar performance to a much larger transformer. H-Nets also show superior performance on various other language modeling tasks (Chinese and code), while learning data-dependent and context-aware chunking schemes. In this talk, we discuss the methodology and ideas behind H-Net, and share some motivation and context for the work. TBD

October 08

Output Supervision Can Obfuscate the Chain of Thought

Alex Turner

Google DeepMind

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-10-08 18:00:00 2025-10-08 19:00:00 America/New_York Output Supervision Can Obfuscate the Chain of Thought RSVP here Recently, OpenAI showed that training against a chain of thought (CoT) monitor can cause obfuscated CoTs, which contain bad behavior the monitor cannot detect. They proposed to keep CoTs monitorable by training only against output monitors that do not have access to CoT. We show that such training can still cause obfuscated CoTs via two mechanisms. First, when a model is trained to produce a safe-looking output, that model may generalize to making its CoTs look safe. Second, since later tokens are conditioned on earlier ones, safe-looking CoTs may increase the likelihood of safe outputs, causing safe-looking CoTs to be reinforced. We introduce two mitigations to address these two issues, which achieve a Pareto improvement in terms of monitorability and task performance compared to regular training. To our knowledge, we are the first to identify and mitigate these problems. Our work implies that preserving CoT monitorability is more difficult than previously thought; we suggest practical guidelines for AI developers to maintain monitorable CoTs.Initial research note: https://www.alignmentforum.org/posts/CM7AsQoBxDW4vhkP3/optimizing-the-final-output-can-obfuscate-cot-research-note?_ga=2.185282604.447175782.1759771884-9535815.1758307660  TBD

October 01

AI Reasoning at Scale with Search

Jonathan Li

Visiting Student, Caltech

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

56-167

Add to Calendar 2025-10-01 18:00:00 2025-10-01 19:00:00 America/New_York AI Reasoning at Scale with Search RSVP here: https://forms.gle/iztowTvkndwa9nBM8Large Language Models (LLMs) have shown impressive generalization across a wide range of tasks, yet they still struggle with complex reasoning and out-of-distribution problem solving. Rather than simply memorizing patterns from pretraining, we seek LLMs that can innovate—generating novel solutions in unfamiliar domains. In this talk, I present a common framework for integrating search-based techniques with LLMs to push the boundaries of their reasoning capabilities. By shifting computational effort from training time to inference time, we enable a new paradigm of inference-time scaling, where search becomes a mechanism for exploration, deliberation, and improvement. Unlike classical search over symbolic states or action spaces, LLM-guided search must operate over open-ended text, requiring novel approaches that are language-centric and model-aware. Through applications in strategy games, code generation, and mathematical problem solving, I will illustrate how these search-augmented methods unlock human-level performance in challenging, unfamiliar environments—paving the way toward more general and superhuman AI systems. TBD

September 18

Understanding AI Systems at Scale: Applied Interpretability, Agent Robustness, and the Science of Model Behaviors

Neil Chowdhury & Vincent Huang

Transluce

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

4-370

Add to Calendar 2025-09-18 18:00:00 2025-09-18 19:00:00 America/New_York Understanding AI Systems at Scale: Applied Interpretability, Agent Robustness, and the Science of Model Behaviors RSVP here: https://forms.gle/XpirPb17Q9HgoshP6 Join two Transluce researchers as they discuss their latest work and research vision. Transluce is a company building the public tech stack for understanding AI systems. Topics will include applied interpretability, scalable oversight, reinforcement learning, and discovering rare behaviors in language models. TBD

September 17

ML for drug discovery at Genesis Therapeutics

Christina Ji, Pranav Murugan, David Williams

Genesis Therapeutics

Part Of

AI@MIT Reading Group

6:00P

- 7:00P

Location

6-120

Add to Calendar 2025-09-17 18:00:00 2025-09-17 19:00:00 America/New_York ML for drug discovery at Genesis Therapeutics RSVP here: https://forms.gle/V3YRySh1urjvCMSt5 Genesis Therapeutics is an industry-leading start-up in the ML-driven drug discovery space. At Genesis, we are integrating ML into many phases of the drug discovery process: from generating new molecules, to sampling protein-ligand conformations, to predicting properties such as potency and ADME. Genesis has built a state-of-the-art denoising diffusion model for protein-ligand structure prediction. Genesis has also developed ML models for molecular property prediction to accelerate the virtual screening process and de-risk drug discovery programs. Genesis Therapeutics has raised over $300M in funding from top technology and biotech investors. We are hiring for internship and full-time positions in ML research, software engineering, and computational chemistry at https://genesistherapeutics.ai/careers/ TBD