Revisiting the Economics of Large Language Models with Neural Scaling Laws and Dynamic Sparsity

Speaker

Anshumali Shrivastava

Rice University

Host

Nir Shavit

CSAIL MIT

Abstract: Neural Scaling Law informally states that increased model size and data
automatically improve AI. However, we have reached a point where the
growth has reached a tipping end where the cost and energy associated
with AI are becoming prohibitive.

This talk will demonstrate the algorithmic progress that can
exponentially reduce the compute and memory cost of training and
inference using "dynamic sparsity" with neural networks. Dynamic
sparsity, unlike static sparsity, aligns with Neural Scaling Laws and
does not reduce the power of neural networks while reducing the number
of FLOPS required by neural models by 99% or more. We will show how data
structures, particularly randomized hash tables, can be used to design
an efficient "associative memory" that reduces the number of
multiplications associated with the training of the neural networks.
Current implementations of this idea challenge the common knowledge
prevailing in the community that specialized processors like GPUs are
significantly superior to CPUs for training large neural networks. The
resulting algorithm is orders of magnitude cheaper and energy-efficient.
Our careful implementations can train billions of parameter
recommendations and Language models on commodity desktop CPUs
significantly faster than top-of-the-line TensorFlow alternatives on the
most potent A100 GPU clusters, with the same or better accuracies.

We will show some demos, including how to train a billion-parameter
language model on a laptop from scratch for search, discovery, and
summarization.

Bio: Anshumali Shrivastava is an associate professor in the computer
science department at Rice University. He is also the Founder and CEO of
ThirdAI Corp, a startup focussed on democratizing Mega-AI models through
"dynamic sparsity". His broad research interests include probabilistic
algorithms for resource-frugal deep learning. In 2018, Science news
named him one of the Top-10 scientists under 40 to watch. He is a
recipient of the National Science Foundation CAREER Award, a Young
Investigator Award from the Air Force Office of Scientific Research, a
machine learning research award from Amazon, and a Data Science Research
Award from Adobe. He has won numerous paper awards, including Best Paper
Award at NIPS 2014, MLSys 2022, and Most Reproducible Paper Award at
SIGMOD 2019. His work on efficient machine learning technologies on CPUs
has been covered by popular press including Wall Street Journal, New
York Times, TechCrunch, NDTV, Engadget, Ars technica, etc.

Add to Calendar 2023-05-10 13:00:00 2023-05-10 14:00:00 America/New_York Revisiting the Economics of Large Language Models with Neural Scaling Laws and Dynamic Sparsity Abstract: Neural Scaling Law informally states that increased model size and dataautomatically improve AI. However, we have reached a point where thegrowth has reached a tipping end where the cost and energy associatedwith AI are becoming prohibitive.This talk will demonstrate the algorithmic progress that canexponentially reduce the compute and memory cost of training andinference using "dynamic sparsity" with neural networks. Dynamicsparsity, unlike static sparsity, aligns with Neural Scaling Laws anddoes not reduce the power of neural networks while reducing the numberof FLOPS required by neural models by 99% or more. We will show how datastructures, particularly randomized hash tables, can be used to designan efficient "associative memory" that reduces the number ofmultiplications associated with the training of the neural networks.Current implementations of this idea challenge the common knowledgeprevailing in the community that specialized processors like GPUs aresignificantly superior to CPUs for training large neural networks. Theresulting algorithm is orders of magnitude cheaper and energy-efficient.Our careful implementations can train billions of parameterrecommendations and Language models on commodity desktop CPUssignificantly faster than top-of-the-line TensorFlow alternatives on themost potent A100 GPU clusters, with the same or better accuracies.We will show some demos, including how to train a billion-parameterlanguage model on a laptop from scratch for search, discovery, andsummarization.Bio: Anshumali Shrivastava is an associate professor in the computerscience department at Rice University. He is also the Founder and CEO ofThirdAI Corp, a startup focussed on democratizing Mega-AI models through"dynamic sparsity". His broad research interests include probabilisticalgorithms for resource-frugal deep learning. In 2018, Science newsnamed him one of the Top-10 scientists under 40 to watch. He is arecipient of the National Science Foundation CAREER Award, a YoungInvestigator Award from the Air Force Office of Scientific Research, amachine learning research award from Amazon, and a Data Science ResearchAward from Adobe. He has won numerous paper awards, including Best PaperAward at NIPS 2014, MLSys 2022, and Most Reproducible Paper Award atSIGMOD 2019. His work on efficient machine learning technologies on CPUshas been covered by popular press including Wall Street Journal, NewYork Times, TechCrunch, NDTV, Engadget, Ars technica, etc. 32-G449

Organizer & Contact

Joanne Talbot Hanley

joanne@csail.mit.edu

617-253-6054

Revisiting the Economics of Large Language Models with Neural Scaling Laws and Dynamic Sparsity

Speaker

Host

May 10 2023

Location

Organizer & Contact