Our aim is to improve the interpretability of deep neural networks to make it possible to understand their decisions, debug their errors, and make systematic improvements.
Deep neural networks are self-programming systems that have been remarkably successful, beating the best hand-designed algorithms to recognize objects in images, translate foreign languages, and win at games. However, deep nets have the disadvantage that their internal reasoning is obscure. In this project, we develop methods to improve the interpretability of deep networks. The key idea is to identify the specific ways that networks decompose larger problems into smaller problems so that the internal behavior of a network can be understood. For example, when a network identifies a highway, it might do so internally by separately detecting cars, road markings, plantings, and signage. Our aim is to visualize, quantify, and enhance this type of causal internal structure so that programmers can explain decisions, debug errors, and make systematic improvements in deep networks.