Model circuits interpretability, and the road to scale it up

Speaker

Technion

Host

Tamar Rott Shaham
MIT

In this talk, we will explore circuit analysis for interpreting neural network models. After some background on the paradigm and techniques of circuit analysis, I'll present two (and a half) research studies demonstrating the breadth of these interpretability methods.We will explore how this paradigm can help gain scientific insights into how neural network models operate, exemplified in the first work ("Arithmetic without Algorithms", https://technion-cs-nlp.github.io/llm-arithmetic-heuristics) where we use circuit analysis to reveal how language models solve arithmetic prompts.We will also show that circuit analysis can reveal findings on neural network models and help fix existing problems in them --- specifically targeting the issue of poor performance of VLMs on visual tasks compared to equivalent textual tasks (done in the work "Same Task, Different Circuits", https://technion-cs-nlp.github.io/vlm-circuits-analysis).Lastly, if time permits, we will discuss an ongoing work on scaling circuit analysis to complex tasks with non-templatic inputs and long-form outputs, with the goal of decomposing real-world model behaviors.Schedule a talk with Yaniv: https://docs.google.com/spreadsheets/d/1Jzx6wSp7rpb46WmbLfzA25qi2RyXukG…