[ML+Crypto] Sequences of Logits and the Low Rank Structure of Language Models

Speaker

Noah Golowich (Microsoft Research NYC)

Host

Shafi, Yael, Jonathan and Vinod

Title: Sequences of Logits and the Low Rank Structure of Language Models

Speaker: Noah Golowich (Microsoft Research NYC)

Time: Tuesday, November 18, 10:30–12:30

Location: 32-G575

Seminar series : ML+Crypto Seminar 

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model’s logits for varying sets of prompts and responses have low approximate rank. Taking a theoretical perspective, we then show that any distribution over sequences with such structure of low approximate logit rank can be provably learned using polynomially many queries to the model's logits and polynomial time. Finally, we show that insights resulting from this perspective of low-rank can be leveraged for generation— in particular, we can generate a response to a target prompt using a linear combination of the model’s outputs on unrelated, or even nonsensical prompts. This new generation procedure may have applications in AI alignment as it could potentially, for instance, yield new approaches for constructing jailbreaks. 

Based on joint work with Allen Liu and Abhishek Shetty.