EECS Special Seminar: Simran Arora, "Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI"

Speaker

Simran Arora

Stanford University

Host

Tim Kraska & Piotr Indyk

CSAIL

Abstract: My work focuses on expanding the AI capabilities we can achieve under any compute constraint. In this talk, we piece-by-piece build up to a simple language model architecture that expands the Pareto frontier between quality and throughput efficiency. The Transformer, AI’s workhorse architecture, is memory hungry, limiting its throughput. This has led to a Cambrian explosion of alternate architecture candidates proposed across prior work. Prior work paints an exciting picture: there are architectures that are asymptotically faster than the Transformer, while also matching its quality. However, I ask, if we’re using asymptotically faster building blocks, what if anything are we giving up in quality?
1. In part one, we understand the tradeoff space and show there’s no free lunch. I present my work to identify and explain the fundamental quality and efficiency tradeoffs between different classes of language model architectures.
2. In part two, we measure how existing architecture candidates fare along on the tradeoff space. While many proposed architectures are asymptotically fast, they struggle to achieve wall-clock speed ups compared to Transformers. I present my work on ThunderKittens, a GPU programming library to make it easier for AI researchers to develop hardware-efficient algorithms.
3. In part three, we expand the Pareto frontier of the tradeoff space. I present the BASED architecture, which is built from simple and hardware-efficient components. In culmination, I released a suite of state-of-the-art 8B-405B parameter Transformer-free language models, per standard evaluations, all on an academic budget.

Given the massive investment into AI models, this work blending AI and systems has had significant impact and adoption in research, open-source, and industry.

Bio: Simran Arora is a PhD student at Stanford University advised by Chris Ré. Her research blends AI and systems towards expanding the Pareto frontier between AI capabilities and efficiency. Her machine learning research has appeared as Oral and Spotlight presentations at NeurIPS, ICML, and ICLR, including an Outstanding Paper award at NeurIPS and Best Paper awards at the ICML ES-FoMo and ICLR DL4C workshops. Her systems work has appeared at VLDB, SIGMOD, CIDR, and CHI, and her systems artifacts are widely used in research, open-source, and industry. Her work has additionally been cited in the US Department of Homeland Security S&T report and won the 2024 Best Cybersecurity Paper Award by the NSA. In 2023, Simran created and taught the CS229s Systems for Machine Learning course at Stanford. She has also been supported by a SGF Sequoia Fellowship and the Stanford Computer Science Graduate Fellowship.

Add to Calendar 2025-04-14 11:00:00 2025-04-14 12:00:00 America/New_York EECS Special Seminar: Simran Arora, "Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI" Abstract: My work focuses on expanding the AI capabilities we can achieve under any compute constraint. In this talk, we piece-by-piece build up to a simple language model architecture that expands the Pareto frontier between quality and throughput efficiency. The Transformer, AI’s workhorse architecture, is memory hungry, limiting its throughput. This has led to a Cambrian explosion of alternate architecture candidates proposed across prior work. Prior work paints an exciting picture: there are architectures that are asymptotically faster than the Transformer, while also matching its quality. However, I ask, if we’re using asymptotically faster building blocks, what if anything are we giving up in quality? 1. In part one, we understand the tradeoff space and show there’s no free lunch. I present my work to identify and explain the fundamental quality and efficiency tradeoffs between different classes of language model architectures. 2. In part two, we measure how existing architecture candidates fare along on the tradeoff space. While many proposed architectures are asymptotically fast, they struggle to achieve wall-clock speed ups compared to Transformers. I present my work on ThunderKittens, a GPU programming library to make it easier for AI researchers to develop hardware-efficient algorithms. 3. In part three, we expand the Pareto frontier of the tradeoff space. I present the BASED architecture, which is built from simple and hardware-efficient components. In culmination, I released a suite of state-of-the-art 8B-405B parameter Transformer-free language models, per standard evaluations, all on an academic budget. Given the massive investment into AI models, this work blending AI and systems has had significant impact and adoption in research, open-source, and industry.  Bio: Simran Arora is a PhD student at Stanford University advised by Chris Ré. Her research blends AI and systems towards expanding the Pareto frontier between AI capabilities and efficiency. Her machine learning research has appeared as Oral and Spotlight presentations at NeurIPS, ICML, and ICLR, including an Outstanding Paper award at NeurIPS and Best Paper awards at the ICML ES-FoMo and ICLR DL4C workshops. Her systems work has appeared at VLDB, SIGMOD, CIDR, and CHI, and her systems artifacts are widely used in research, open-source, and industry. Her work has additionally been cited in the US Department of Homeland Security S&T report and won the 2024 Best Cybersecurity Paper Award by the NSA. In 2023, Simran created and taught the CS229s Systems for Machine Learning course at Stanford. She has also been supported by a SGF Sequoia Fellowship and the Stanford Computer Science Graduate Fellowship. TBD

Organizer & Contact

Fern Keniston

fern@csail.mit.edu

Part of

EECS Special Seminar

EECS Special Seminar: Simran Arora, "Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI"

Speaker

Host

April 14 2025

Location

Organizer & Contact

Part of

April 23

EECS Special Seminar: Tijana Zrnic, "AI-Assisted Approaches to Data Collection and Inference"

April 08

EECS Special Seminar: Evan Johnson, "Preserving Language-level Security in Real Systems"

EECS Special Seminar: Simran Arora, "Pareto-efficient AI systems: Expanding the quality and efficiency frontier of AI"

Speaker

Host

April 14 2025

Location

Organizer & Contact

Part of

Related Events

April 23

EECS Special Seminar: Tijana Zrnic, "AI-Assisted Approaches to Data Collection and Inference"

April 08

EECS Special Seminar: Evan Johnson, "Preserving Language-level Security in Real Systems"