DeltaNet and Beyond: The Next Generation of Scalable RNNs

Speaker

Songlin Yang

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

Host

Shannon Shen

MIT CSAIL

Abstract:
Hardware-efficient variants of RNNs are receiving renewed attention for their scalability in training and inference, offering an attractive alternative to self-attention. Notable examples include Mamba, RWKV, GLA, DeltaNet, and xLSTM. In this talk, I will introduce DeltaNet, a linear RNN that combines strong in-context retrieval and state tracking with hardware-efficient training. I will motivate DeltaNet from an in-context learning perspective and discuss strategies for scaling it effectively. The talk will also explore DeltaNet’s connections to recent developments such as test-time training (TTT) and Titans, along with emerging extensions including Gated DeltaNet, RWKV-7, DeltaProduct, LongHorn and Mesa layer.

Bio:
Songlin Yang is a second-year Ph.D. student at MIT CSAIL, advised by Prof. Yoon Kim. Her research focuses on hardware-aware algorithms for efficient sequence modeling, with a particular emphasis on linear attention models. She is the lead contributor to the Flash Linear Attention library.

Add to Calendar 2025-05-08 16:00:00 2025-05-08 17:00:00 America/New_York DeltaNet and Beyond: The Next Generation of Scalable RNNs Abstract: Hardware-efficient variants of RNNs are receiving renewed attention for their scalability in training and inference, offering an attractive alternative to self-attention. Notable examples include Mamba, RWKV, GLA, DeltaNet, and xLSTM. In this talk, I will introduce DeltaNet, a linear RNN that combines strong in-context retrieval and state tracking with hardware-efficient training. I will motivate DeltaNet from an in-context learning perspective and discuss strategies for scaling it effectively. The talk will also explore DeltaNet’s connections to recent developments such as test-time training (TTT) and Titans, along with emerging extensions including Gated DeltaNet, RWKV-7, DeltaProduct, LongHorn and Mesa layer. Bio: Songlin Yang is a second-year Ph.D. student at MIT CSAIL, advised by Prof. Yoon Kim. Her research focuses on hardware-aware algorithms for efficient sequence modeling, with a particular emphasis on linear attention models. She is the lead contributor to the Flash Linear Attention library.   TBD

Organizer & Contact

Marcia Davidson

marcia@csail.mit.edu

617-253-3049

Part of

Embodied Intelligence 2024-2025

DeltaNet and Beyond: The Next Generation of Scalable RNNs

Speaker

Host

May 08 2025

Location

Organizer & Contact

Part of

May 27

Neural Robot Navigation with Foundational and Bio-inspired Models

May 01

Speech Generation and Sound Understanding in Era of Large Language Models

DeltaNet and Beyond: The Next Generation of Scalable RNNs

Speaker

Host

May 08 2025

Location

Organizer & Contact

Part of

Related Events

May 27

Neural Robot Navigation with Foundational and Bio-inspired Models

May 01

Speech Generation and Sound Understanding in Era of Large Language Models