Decomposing Predictions by Modeling Model Computation
Harshay Shah
MIT CSAIL
Add to Calendar
2024-05-02 16:00:00
2024-05-02 16:30:00
America/New_York
Decomposing Predictions by Modeling Model Computation
Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. Paper: https://arxiv.org/abs/2404.11534Blog post: https://gradientscience.org/modelcomponents/Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations.
Room 32-G449 (Patil/Kiva)