Thesis Defense: Transparent Value Alignment: Foundations for Human-Centered Explainable AI in Alignment
Host
Julie Shah
MIT CSAIL, MIT AeroAstro
Abstract:
Alignment of robot objectives with those of humans can greatly enhance robots’ ability to act flexibly to safely and reliably meet humans’ goals across diverse contexts from space exploration to robotic manufacturing. However, it is often difficult or impossible for humans, both expert and non-expert, to enumerate their objectives comprehensively, accurately, and in forms that are readily usable for robot planning. Value alignment is an open challenge in artificial intelligence that aims to address this problem by enabling robots and autonomous agents to infer human goals and values through interaction. Providing humans with direct and explicit feedback about this value learning process through explainable AI (XAI) can enable humans to more efficiently and effectively teach robots about their goals. In this talk, I will introduce the Transparent Value Alignment (TVA) paradigm which captures this two-way communication and inference process and will discuss foundations for the design and evaluation of XAI within this paradigm. First, I will present a novel suite of metrics for assessing alignment which have been validated through human subject experiments by applying approaches from cognitive psychology. Next, I will propose design guidance for XAI within the TVA context which is grounded in results from a set of human studies comparing a broad range of explanation techniques across multiple domains and dimensions of complexity. Finally, I will discuss the Situation Awareness Framework for Explainable AI (SAFE-AI), a human factors-based framework for the design and evaluation of XAI across diverse contexts including alignment. I will additionally highlight how this research relates to real-world robotic manufacturing and space exploration settings that I have studied. I will conclude the talk by briefly discussing the future vision of this work.
Thesis Advisor: Julie Shah
Thesis Committee: Julie Shah (Chair), Jessie Chen, Dylan Hadfield-Menell, Dava Newman, and David Pynadath
Alignment of robot objectives with those of humans can greatly enhance robots’ ability to act flexibly to safely and reliably meet humans’ goals across diverse contexts from space exploration to robotic manufacturing. However, it is often difficult or impossible for humans, both expert and non-expert, to enumerate their objectives comprehensively, accurately, and in forms that are readily usable for robot planning. Value alignment is an open challenge in artificial intelligence that aims to address this problem by enabling robots and autonomous agents to infer human goals and values through interaction. Providing humans with direct and explicit feedback about this value learning process through explainable AI (XAI) can enable humans to more efficiently and effectively teach robots about their goals. In this talk, I will introduce the Transparent Value Alignment (TVA) paradigm which captures this two-way communication and inference process and will discuss foundations for the design and evaluation of XAI within this paradigm. First, I will present a novel suite of metrics for assessing alignment which have been validated through human subject experiments by applying approaches from cognitive psychology. Next, I will propose design guidance for XAI within the TVA context which is grounded in results from a set of human studies comparing a broad range of explanation techniques across multiple domains and dimensions of complexity. Finally, I will discuss the Situation Awareness Framework for Explainable AI (SAFE-AI), a human factors-based framework for the design and evaluation of XAI across diverse contexts including alignment. I will additionally highlight how this research relates to real-world robotic manufacturing and space exploration settings that I have studied. I will conclude the talk by briefly discussing the future vision of this work.
Thesis Advisor: Julie Shah
Thesis Committee: Julie Shah (Chair), Jessie Chen, Dylan Hadfield-Menell, Dava Newman, and David Pynadath