H-Nets: Dynamic Chunking for End-to-End Hierarchical Sequence Modeling

Speaker

Cartesia AI (https://cartesia.ai/)

Host

MIT CSAIL, AI@MIT Reading Group

RSVP here: https://forms.gle/Es9S5VtgiGA8fvkd9 

 

Dynamic chunking (arXiv:2507.07955) is a recently introduced method to train end-to-end hierarchical sequence models (H-Nets). H-Nets model sequences while explicitly chunking them into higher-order concepts, thus allowing them to train on raw UTF-8 bytes without the need of tokenization, and also allowing them to learn sparser language representations than subword tokens. On language modeling tasks, H-Nets show superior performance to a standard tokenized transformer baseline, while exhibiting similar performance to a much larger transformer. H-Nets also show superior performance on various other language modeling tasks (Chinese and code), while learning data-dependent and context-aware chunking schemes. In this talk, we discuss the methodology and ideas behind H-Net, and share some motivation and context for the work.