Our goal is to design novel data compression techniques to accelerate popular machine learning algorithms in Big Data and streaming settings.
Popular machine learning algorithms are computationally expensive, or worse yet, intractable to train on Big Data. We are developing efficient algorithms for constructing coresets, small weighted subsets of the input points that provably approximate the original data set, for accelerating machine learning algorithms such as Neural Networks and Support Vector Machines. Our goal is to construct compact and representative coresets such that machine learning algorithms trained on small-sized coresets are provably competitive with those trained on the original (massive) data set.
This work is done in collaboration with Dan Feldman from the University of Haifa.