CSAIL Forum with Prof Yoon Kim: Efficient and Expressive Architectures for Language Modeling
Efficient and Expressive Architectures for Language Modeling
Speaker: Yoon Kim, Assistant Professor, CSAIL
Tuesday 12:00-1:00 EDT, April 22, 2025
live stream via Zoom: Registration required
Abstract:
Transformers are the dominant architecture for language modeling (and generative AI more broadly). The attention mechanism in Transformers is considered core to the architecture and enables accurate sequence modeling at scale. However, the complexity of attention is quadratic in input length, which makes it difficult to apply Transformers to model long sequences. Moreover, Transformers have theoretical limitations when it comes to the class of problems it can solve, which prevents their being able to model certain kinds of phenomena such as state tracking. This talk will describe some recent work on efficient alternatives to Transformers which can overcome these limitations.
Bio:
Yoon Kim is an assistant professor at MIT EECS and a principal investigator at CSAIL, where he works on natural language processing and machine learning. He obtained his Ph.D. in computer science from Harvard University.