Thesis Defense: Building Strategic AI Agents for Human-centric Multi-agent Systems

Speaker

MIT CSAIL

Host

Jacob Andreas

MIT CSAIL

Title: "Building Strategic AI Agents for Human-centric Multi-agent Systems"

Date: Tuesday, August 13, 2024

Time: 11:00 AM - 12:00 PM (Eastern Time)

Location:
- In-person: D463 (Star), Stata Center, 32 Vassar Street, Cambridge, MA, 02139
- Virtual: https://mit.zoom.us/j/94031036791

Thesis Supervisor: Jacob Andreas

Thesis Committee: Gabriele Farina, Constantinos Daskalakis, Roger Levy

Abstract:

This thesis addresses the challenge of developing strategic AI agents capable of effective decision-making and communication in human-centric multi-agent systems. While significant progress has been made in AI for strategic decision-making, creating agents that can seamlessly interact with humans in multi-agentic settings remains a challenge. This research explores the limitations of current approaches, such as self-play reinforcement learning (RL) and imitation learning (IL), and proposes novel methods to overcome these constraints.

Modeling human-like communication and decision making is a crucial first step toward building effective strategic agents. The initial part of the thesis addresses this through two approaches. We start by developing a regret minimization algorithm for modeling actions of strong and human-like agents called piKL, that incorporates a cost term proportional to the KL divergence between a search policy and a human IL policy. This approach improves reward while keeping behavior close to a human IL policy, producing agents that predict human actions accurately while improving performance in the benchmark game of no-press Diplomacy. Then, we develop a procedure for modeling populations of agents that communicate with humans using natural language. Our sample-efficient multitask training scheme for latent language policies improves the reward obtained by these policies while preserving the semantics of language in a complex real-time strategy game.

Building on these foundations, the second part of the thesis focuses on building strategic agents for human-centric multi-agent domains. The research introduces the DiL-piKL planning algorithm and its extension, RL-DiL-piKL, which regularize self-play RL and search towards a human IL policy. These algorithms enable the training of Diplodocus, an agent achieving expert human-level performance in no-press Diplomacy. A significant milestone is reached with Cicero, the first AI agent to achieve human-level performance in full-press Diplomacy, integrating a language model (LM) with planning and RL algorithms based on piKL.

The final part of the thesis revisits language generation tasks, applying piKL to model pragmatic communication and improving LM truthfulness. It presents Regularized Conventions, a model of pragmatic language understanding that outperforms existing best response and rational speech act models across several datasets. Furthermore, a novel approach to LM decoding is introduced, casting it as a regularized imperfect-information sequential signaling game. This results in the Equilibrium-Ranking algorithm, which consistently improves performance over existing language model decoding procedures.

Add to Calendar 2024-08-13 11:00:00 2024-08-13 12:00:00 America/New_York Thesis Defense: Building Strategic AI Agents for Human-centric Multi-agent Systems Title: "Building Strategic AI Agents for Human-centric Multi-agent Systems"Date: Tuesday, August 13, 2024Time: 11:00 AM - 12:00 PM (Eastern Time)Location: - In-person: D463 (Star), Stata Center, 32 Vassar Street, Cambridge, MA, 02139 - Virtual: https://mit.zoom.us/j/94031036791Thesis Supervisor: Jacob Andreas Thesis Committee: Gabriele Farina, Constantinos Daskalakis, Roger LevyAbstract:This thesis addresses the challenge of developing strategic AI agents capable of effective decision-making and communication in human-centric multi-agent systems. While significant progress has been made in AI for strategic decision-making, creating agents that can seamlessly interact with humans in multi-agentic settings remains a challenge. This research explores the limitations of current approaches, such as self-play reinforcement learning (RL) and imitation learning (IL), and proposes novel methods to overcome these constraints. Modeling human-like communication and decision making is a crucial first step toward building effective strategic agents. The initial part of the thesis addresses this through two approaches. We start by developing a regret minimization algorithm for modeling actions of strong and human-like agents called piKL, that incorporates a cost term proportional to the KL divergence between a search policy and a human IL policy. This approach improves reward while keeping behavior close to a human IL policy, producing agents that predict human actions accurately while improving performance in the benchmark game of no-press Diplomacy. Then, we develop a procedure for modeling populations of agents that communicate with humans using natural language. Our sample-efficient multitask training scheme for latent language policies improves the reward obtained by these policies while preserving the semantics of language in a complex real-time strategy game. Building on these foundations, the second part of the thesis focuses on building strategic agents for human-centric multi-agent domains. The research introduces the DiL-piKL planning algorithm and its extension, RL-DiL-piKL, which regularize self-play RL and search towards a human IL policy. These algorithms enable the training of Diplodocus, an agent achieving expert human-level performance in no-press Diplomacy. A significant milestone is reached with Cicero, the first AI agent to achieve human-level performance in full-press Diplomacy, integrating a language model (LM) with planning and RL algorithms based on piKL. The final part of the thesis revisits language generation tasks, applying piKL to model pragmatic communication and improving LM truthfulness. It presents Regularized Conventions, a model of pragmatic language understanding that outperforms existing best response and rational speech act models across several datasets. Furthermore, a novel approach to LM decoding is introduced, casting it as a regularized imperfect-information sequential signaling game. This results in the Equilibrium-Ranking algorithm, which consistently improves performance over existing language model decoding procedures. D463 (Star)

Organizer & Contact

Athul Paul Jacob

apjacob@mit.edu

Thesis Defense: Building Strategic AI Agents for Human-centric Multi-agent Systems

Speaker

Host

August 13 2024

Location

Organizer & Contact