DeltaNet and Beyond: The Next Generation of Scalable RNNs

Speaker

Songlin Yang
MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

Host

Shannon Shen
MIT CSAIL

Abstract: 
Hardware-efficient variants of RNNs are receiving renewed attention for their scalability in training and inference, offering an attractive alternative to self-attention. Notable examples include Mamba, RWKV, GLA, DeltaNet, and xLSTM. In this talk, I will introduce DeltaNet, a linear RNN that combines strong in-context retrieval and state tracking with hardware-efficient training. I will motivate DeltaNet from an in-context learning perspective and discuss strategies for scaling it effectively. The talk will also explore DeltaNet’s connections to recent developments such as test-time training (TTT) and Titans, along with emerging extensions including Gated DeltaNet, RWKV-7, DeltaProduct, LongHorn and Mesa layer. 

Bio: 
Songlin Yang is a second-year Ph.D. student at MIT CSAIL, advised by Prof. Yoon Kim. Her research focuses on hardware-aware algorithms for efficient sequence modeling, with a particular emphasis on linear attention models. She is the lead contributor to the Flash Linear Attention library.