[Scale ML] Songlin Yang: Linear Attention and Beyond
Speaker
Host
Speaker: Songlin Yang
Topic: Linear Attention and Beyond
Date: Wednesday, Mar 5
Time: 4:00 PM (EST)
Zoom: https://mit.zoom.us/j/91697262920 (password: mitmlscale)
Abstract: Recently, there has been rapid progress in the field of linear attention, including Lightning Attention (featured in the recent LLM MiniMax-01 with GPT-4.0-level performance), RetNet, Mamba2, xLSTM, GLA, DeltaNet, RWKV 7, TTT, and Titans, among others. In this talk, I will discuss the development and efficient training of linear attention models, explore the connections among these models, and culminate by deriving them in a principled manner through the lens of in-context meta-learning and gradient-based optimization.
Bio: Songlin is a second-year Ph.D. student at MIT CSAIL, where she is advised by Prof. Yoon Kim. Her research focuses on hardware-aware algorithm design for sequence modeling, particularly in linear attention models.