Techniques for Interpretability and Transparency of Black-Box Models

Speaker

MIT CSAIL

Host

Julie Shah

MIT CSAIL

Abstract: Recently, black-box models, such as neural networks, have been increasingly adopted in many tasks. However, their opacity, or the inability to understand their inner-workings, has hindered the deployment in high-stakes domains such as healthcare or finance. In this talk, I describe my research in interpretability and transparency to address this issue.

In the interpretability category, I introduce two fundamental properties of good explanations for model predictions, correctness and understandability. Correctness captures the notion that the explanations should faithfully represent the model’s decision making logic, and understandability reflects the requirement that these explanations should be reliably understood by human users. For both properties, I propose evaluation metrics as well as methods that improve upon existing ones, while identifying avenues for future work.

In the transparency category, I present the transparency-by-example framework, a Bayesian sampling formulation to inspect models and identify a wide range of model behaviors. I demonstrate the flexibility of this Bayesian approach by applying it to both deep neural networks and non-differentiable robot controllers, revealing hidden and hard-to-find insights in both cases.

Add to Calendar 2022-12-01 10:00:00 2022-12-01 12:00:00 America/New_York Techniques for Interpretability and Transparency of Black-Box Models Abstract: Recently, black-box models, such as neural networks, have been increasingly adopted in many tasks. However, their opacity, or the inability to understand their inner-workings, has hindered the deployment in high-stakes domains such as healthcare or finance. In this talk, I describe my research in interpretability and transparency to address this issue. In the interpretability category, I introduce two fundamental properties of good explanations for model predictions, correctness and understandability. Correctness captures the notion that the explanations should faithfully represent the model’s decision making logic, and understandability reflects the requirement that these explanations should be reliably understood by human users. For both properties, I propose evaluation metrics as well as methods that improve upon existing ones, while identifying avenues for future work. In the transparency category, I present the transparency-by-example framework, a Bayesian sampling formulation to inspect models and identify a wide range of model behaviors. I demonstrate the flexibility of this Bayesian approach by applying it to both deep neural networks and non-differentiable robot controllers, revealing hidden and hard-to-find insights in both cases. Seminar Room D463 (Star)

Organizer & Contact

Yilun Zhou

yilun@mit.edu

Techniques for Interpretability and Transparency of Black-Box Models

Speaker

Host

December 01 2022

Location

Organizer & Contact