BLAS-on-Flash: An alternative to large-scale ML training and inference? [Room 32-G882]

Speaker

Microsoft Research India

Host

Julian Shun

MIT CSAIL

Abstract:
Many large scale machine learning training and inference tasks are memory-bound rather than compute-bound i.e. on large data sets, the working set of these algorithms does not fit in memory for jobs that could run overnight on a few multi-core processors. This often forces an expensive redesign of the algorithm for distributed platforms such as parameter servers and Spark. BLAS-on-flash provides an inexpensive and efficient alternative based on the observation that many ML tasks admit algorithms that can be programmed with linear algebra subroutines. Our library supports a BLAS and sparseBLAS interface on large SSD-resident matrices, enabling multi-threaded code to scale to industrial scale datasets on a single workstation. Using BLAS-on-flash, we are able to process 10x larger models on 10x larger inputs in the same memory envelope in two key production pipelines: training large scale topic models and inference for extreme multi-label learning. This suggests that our approach could be an efficient alternative to expensive distributed big-data systems for scaling up structurally complex machine learning tasks.

In this talk, we will take a look at the BLAS-on-flash API, design and implementation of the runtime and the above mentioned case-studies in detail.

Relevant Paper:
Subramanya, Suhas Jayaram, et al. "BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?" NSDI 2019.

Add to Calendar 2019-02-25 10:30:00 2019-02-25 11:30:00 America/New_York BLAS-on-Flash: An alternative to large-scale ML training and inference? [Room 32-G882] Abstract: Many large scale machine learning training and inference tasks are memory-bound rather than compute-bound i.e. on large data sets, the working set of these algorithms does not fit in memory for jobs that could run overnight on a few multi-core processors. This often forces an expensive redesign of the algorithm for distributed platforms such as parameter servers and Spark. BLAS-on-flash provides an inexpensive and efficient alternative based on the observation that many ML tasks admit algorithms that can be programmed with linear algebra subroutines. Our library supports a BLAS and sparseBLAS interface on large SSD-resident matrices, enabling multi-threaded code to scale to industrial scale datasets on a single workstation. Using BLAS-on-flash, we are able to process 10x larger models on 10x larger inputs in the same memory envelope in two key production pipelines: training large scale topic models and inference for extreme multi-label learning. This suggests that our approach could be an efficient alternative to expensive distributed big-data systems for scaling up structurally complex machine learning tasks.In this talk, we will take a look at the BLAS-on-flash API, design and implementation of the runtime and the above mentioned case-studies in detail.Relevant Paper:Subramanya, Suhas Jayaram, et al. "BLAS-on-flash: An Efficient Alternative for Large Scale ML Training and Inference?" NSDI 2019. 32-G882

Organizer & Contact

Julian Shun

jshun@csail.mit.edu

BLAS-on-Flash: An alternative to large-scale ML training and inference? [Room 32-G882]

Speaker

Host

February 25 2019

Location

Organizer & Contact

October 17

Rethinking Caching: Algorithms and Bounds for Parallel Memory Systems [Zoom]

October 07

[ML+Crypto Seminar] Model Stealing: Recent Results and Open Problems

BLAS-on-Flash: An alternative to large-scale ML training and inference? [Room 32-G882]

Speaker

Host

February 25 2019

Location

Organizer & Contact

Related Events

October 17

Rethinking Caching: Algorithms and Bounds for Parallel Memory Systems [Zoom]

October 07

[ML+Crypto Seminar] Model Stealing: Recent Results and Open Problems