Programmatic policies: a story on imitation and interpretation

Speaker

Levi Lelis

University of Alberta

Host

Armando Solar-Lezama

CSAIL MIT

Zoom link: https://mit.zoom.us/j/97170862568

Abstract:
In this seminar, we will explore the concept of programmatic policies – computer programs that encode policies – as a solution to sequential decision-making problems. Depending on the domain-specific language used, programmatic policies may offer advantages over other representations, especially in terms of generalization and interpretability. The challenge with programmatic policies lies in searching in vast and often discontinuous program spaces during synthesis. A common technique from the literature involves training a neural oracle using reinforcement learning and then distilling it into a programmatic policy through imitation learning. The limitation of this method arises from a potential representation gap between the oracle and what can be defined as a programmatic policy. Thus, this method is likely most effective when the programmatic policy itself takes the form of a neural network. I will show how program sketches can extend imitation learning, making it suitable for guiding searches beyond just spaces that encode neural networks. To demonstrate the potential of this imitation-based sketch search, we will discuss its performance both in terms of quality and interpretability, when applied to a real-time strategy game. While assessing policy quality is straightforward, our emphasis will be on evaluating interpretability. In this context, I will introduce the LINT score (LLM-based INTerpretability score), an LLM-based metric designed to assess the interpretability of programs.

Bio:
Levi Lelis completed his PhD in 2013 at the University of Alberta in Canada and joined the Universidade Federal de Viçosa in Brazil as a Professor. During his tenure there, he taught a variety of courses and led the establishment of the institution's PhD program in Computing Science. In 2020, he returned to the University of Alberta where he is currently an Assistant Professor, an Amii fellow, and a CIFAR AI Chair.

Add to Calendar 2023-08-22 11:00:00 2023-08-22 12:00:00 America/New_York Programmatic policies: a story on imitation and interpretation Zoom link: https://mit.zoom.us/j/97170862568Abstract: In this seminar, we will explore the concept of programmatic policies – computer programs that encode policies – as a solution to sequential decision-making problems. Depending on the domain-specific language used, programmatic policies may offer advantages over other representations, especially in terms of generalization and interpretability. The challenge with programmatic policies lies in searching in vast and often discontinuous program spaces during synthesis. A common technique from the literature involves training a neural oracle using reinforcement learning and then distilling it into a programmatic policy through imitation learning. The limitation of this method arises from a potential representation gap between the oracle and what can be defined as a programmatic policy. Thus, this method is likely most effective when the programmatic policy itself takes the form of a neural network. I will show how program sketches can extend imitation learning, making it suitable for guiding searches beyond just spaces that encode neural networks. To demonstrate the potential of this imitation-based sketch search, we will discuss its performance both in terms of quality and interpretability, when applied to a real-time strategy game. While assessing policy quality is straightforward, our emphasis will be on evaluating interpretability. In this context, I will introduce the LINT score (LLM-based INTerpretability score), an LLM-based metric designed to assess the interpretability of programs.Bio: Levi Lelis completed his PhD in 2013 at the University of Alberta in Canada and joined the Universidade Federal de Viçosa in Brazil as a Professor. During his tenure there, he taught a variety of courses and led the establishment of the institution's PhD program in Computing Science. In 2020, he returned to the University of Alberta where he is currently an Assistant Professor, an Amii fellow, and a CIFAR AI Chair. Seminar Room G882 (Hewlett Room)

Organizer & Contact

Amanda Abrams

acabrams@csail.mit.edu

Programmatic policies: a story on imitation and interpretation

Speaker

Host

August 22 2023

Location

Organizer & Contact