Learning patterns in Big Data from small data using core-sets

Speaker: Dan Feldman , MIT CSAIL
Date: April 5 2012
Time: 2:30PM to 3:30PM
Location: 32-D507
Host: Polina Golland, CSAIL
Contact: Polina Golland, x38005, polina@mit.edu
Relevant URL: When we need to solve an optimization problem we usually use the best
available algorithm/software or try to improve it. In recent years we
have started exploring a different approach: instead of improving the
algorithm, reduce the input data and run the existing algorithm on the
reduced data to obtain the desired output much faster.
A core-set for a given problem is a compressed representation of
its input, in the sense that a solution for the problem with the
(small) coreset as input yields an approximate solution to the
problem with the original (large) input. Popular techniques such as
compressed sensing, sketches, PAC-learning and importance sampling
are special cases of core-sets.
A core-set usually turns inefficient non-parallel in-memory algorithms
into linear time practical algorithms, which can be computed using a
manageable amount of memory in parallel (say, using Hadoop, cloud
service, or GPUs), via one pass over terabytes of input data.
I will explain how and when this magical new paradigm works, and will
show recent applications in machine learning, computer vision, text
mining, and smartphone data analysis. My hope is that the talk will
help you to construct core-sets for problems in your current research.
See other events that are part of Biomedical Imaging and Analysis 2011/2012
See other events happening in April 2012