Decomposing Predictions by Modeling Model Computation

Speaker

Harshay Shah

MIT CSAIL

Host

Behrooz Tahmasebi

MIT CSAIL

Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks.

Paper: https://arxiv.org/abs/2404.11534
Blog post: https://gradientscience.org/modelcomponents/

Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations.

Add to Calendar 2024-05-02 16:00:00 2024-05-02 16:30:00 America/New_York Decomposing Predictions by Modeling Model Computation Abstract: How does the internal computation of a machine learning model transform inputs into predictions? In this paper, we introduce a task called component modeling that aims to address this question. The goal of component modeling is to decompose an ML model's prediction in terms of its components -- simple functions (e.g., convolution filters, attention heads) that are the "building blocks" of model computation. We focus on a special case of this task, component attribution, where the goal is to estimate the counterfactual impact of individual components on a given prediction. We then present COAR, a scalable algorithm for estimating component attributions; we demonstrate its effectiveness across models, datasets, and modalities. Finally, we show that component attributions estimated with COAR directly enable model editing across five tasks, namely: fixing model errors, ``forgetting'' specific classes, boosting subpopulation robustness, localizing backdoor attacks, and improving robustness to typographic attacks. Paper: https://arxiv.org/abs/2404.11534Blog post: https://gradientscience.org/modelcomponents/Bio: Harshay is a PhD student at MIT CSAIL, advised by Aleksander Madry. His research interests are broadly in developing tools to understand and steer model behavior. Recently, he has been working on understanding how training data and learning algorithms collectively shape neural network representations. Room 32-G449 (Patil/Kiva)

Organizer & Contact

Behrooz Tahmasebi

bzt@csail.mit.edu

Part of

ML Tea

Decomposing Predictions by Modeling Model Computation

Speaker

Host

May 02 2024

Location

Organizer & Contact

Part of

October 15

ML Tea: Chain-of-Thought Degrades Abstention in LLMs, Unless Inverted / Context-aware sequence-to-function model of human gene regulation

October 06

ML Tea: Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Decomposing Predictions by Modeling Model Computation

Speaker

Host

May 02 2024

Location

Organizer & Contact

Part of

Related Events

October 15

ML Tea: Chain-of-Thought Degrades Abstention in LLMs, Unless Inverted / Context-aware sequence-to-function model of human gene regulation

October 06

ML Tea: Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions