Sketching and Streaming Algorithms

Speaker: Jelani Nelson , M.I.T.
Date: March 10 2011
Time: 4:00PM to 5:00PM
Location: 32-G449
Host: Nancy Lynch and Rob Miller, CSAIL
Contact: Francis Doughty, 253-4602, doughty@mit.edu
In recent years, the amount of new data created in the world has grown
exponentially faster than both network bandwidth and available
storage. To overcome this problem, it is important to develop methods
for producing "sketches" of data that are (much) shorter than the
original files. Access to the sketch alone should enable answering
meaningful queries about the data. Another desirable property is that
the sketches across data should be composable, in order to compute
aggregate statistics and distance and similarity measures.
Furthermore, many applications require that these sketches be created
by "streaming algorithms", i.e. algorithms given only one (or few)
passes over the data.
In this talk, I will discuss several streaming algorithms that produce
sketches exponentially smaller than the original data size. These
sketches allow for estimating the number of distinct items in a data
stream, as well as norm estimation, empirical entropy estimation, and
fast dimensionality reduction of count vectors being updated in a
stream. The dimensionality reduction sketch can also be plugged into
approximate numerical linear algebra algorithms to produce fast,
low-memory algorithms for approximate linear regression and low-rank
approximation given only one (or few) passes over a large matrix.
Bio: Jelani Nelson is a PhD candidate in the Theory of Computation
group at MIT CSAIL. His main research interests are algorithms,
especially for massive datasets, and pseudorandomness.
See other events that are part of CS Special Seminar Series 2010/2011
See other events happening in March 2011