This project includes designing efficient algorithms and proving lower bounds for fundamental problems under the models that address big data.

The recent availability of large data sets has had a significant impact on design of algorithms and has led to the emergence of computational models that captures various aspects of massive data computations. Examples of the models include streaming algorithms, sublinear time algorithms, sublinear query time algorithms with preprocessing, property testing algorithms, and distributed algorithms. We present algorithms and prove lower bounds for problems in these models. Examples of the problems include: *Approximate Nearest Neighbor: given a set of points in high dimensions, the goal is to preprocess them such that given a query point, we can search for the closest point in the data set to the query in sublinear time. This problem has numerous applications in information retrieval, computer vision, clustering, etc. *Composable Coresets: one of the approaches to efficiently summarize the data set that has the composability property. That is, given multiple data sets, the union of their summaries should be a valid summary for the union of the data sets. This notion has applications to distributed models, streaming algorithms, etc. *The Set Cover problem: given a set of elements and a collection of sets over these elements, the goal is to choose the minimum number of sets that cover all the elements. This problem has many applications in information retrieval, operation research, machine learning, etc. We consider this problem in the sublinear time (where one computes the answer without reading the whole input) and streaming algorithms (where one makes a small number of passes over the data using small storage) models.

Research Areas