[Scale ML] Self-improvement of LLM agents through Reinforcement Learning at Scale
--- Title ---
Self-improvement of LLM agents through Reinforcement Learning at Scale
--- Time ---
12pm Wednesday (Mar 26th) 45-332 (its on the 3rd floor) Zoom: https://mit.zoom.us/j/91697262920 (password: mitmlscale)
--- Abstract ---
How can we efficiently enhance LLM agents for decision-making at scale? In this talk, we explore our recent efforts in applying autonomous reinforcement learning (RL) to LLM agents, enabling a self-improvement loop without additional human supervision. We begin by introducing DigiRL, the first framework that establishes the key ingredients necessary for applying autonomous RL to realistic digital agent tasks—such as controlling an Android emulator connected to the Internet in the same way a human would. Building on this foundation, we discuss broader implications of autonomous self-improvement. First, we examine how LLM agents can autonomously generate and learn from new tasks using a Proposer-Agent-Evaluator framework, where self-generated tasks drive the development of generalist capabilities. Next, we consider scenarios where direct environment interaction is restricted due to safety concerns and present DigiQ, which introduces key algorithmic modifications to enable constrained yet effective learning. Finally, we discuss the role of humans in shaping self-improving agentic systems, as explored in SWEET-RL. Through these discussions, we aim to highlight the transformative potential of large-scale autonomous RL for LLM agents and outline the challenges and opportunities that lie ahead.
--- Bio ---
Yifei is a Ph.D. student at UC Berkeley advised by prof. Sergey Levine, and also a visiting researcher at FAIR Meta. His recent research focus has been on interactive decision-making and reinforcement learning with LLM agents. In particular, he is interested in large-scale self-improvements through open-ended reinforcement learning without additional supervision.
--- Important Info (free food) ---
- This is a zoom session since our speaker is remote
- There is a time change - this talk is at 12pm EST, this is because due to spring break we expect fewer in-person audience so we will cater lunch! This is trial run - and if it goes well we may make our sessions lunch sessions in the future as well
--- Seminar Info ---
We are a cross-lab MIT AI graduate student collective focusing on Algorithms That Learn and Scale. We currently host bi-weekly seminars and will have hands on sessions and research socials in the future. We are funded by generous donations from Pulkit Agrawal, Yoon Kim and BVP.