Sampling Sketches for Concave Sublinear Functions of Frequencies

Speaker

Ofir Geri

Stanford

Host

Govind Ramnarayan, Quanquan Liu, Sitan Chen

We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning.

A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments x^p for 0

Joint work with Edith Cohen.

Add to Calendar 2019-05-15 15:00:00 2019-05-15 16:00:00 America/New_York Sampling Sketches for Concave Sublinear Functions of Frequencies We consider massive distributed datasets that consist of elements that are key-value pairs. Our goal is to compute estimates of statistics or aggregates over the data, where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning.A common approach is to maintain a sample of keys and estimate statistics from the sample. Ideally, to obtain low-variance estimates we sample keys with probabilities proportional to their contributions. One simple way to do so is to first aggregate the raw data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. This aggregation however requires data structures of size proportional to the number of distinct keys and is too costly when the number is very large. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies (including log, the moments x^p for 0Joint work with Edith Cohen. 32-G575

Organizer & Contact

Rebecca Yadegar

ryadegar@csail.mit.edu

Part of

Algorithms & Complexity Seminars 2018-2019

Sampling Sketches for Concave Sublinear Functions of Frequencies

Speaker

Host

May 15 2019

Location

Organizer & Contact

Part of

November 09

Simple and Efficient Algorithm for Parallel Matchings

June 05

The Sample Complexity of Toeplitz Covariance Estimation

Sampling Sketches for Concave Sublinear Functions of Frequencies

Speaker

Host

May 15 2019

Location

Organizer & Contact

Part of

Related Events

November 09

Simple and Efficient Algorithm for Parallel Matchings

June 05

The Sample Complexity of Toeplitz Covariance Estimation