To handle big data, shrink it - algorithm reduces size of data sets while preserving their mathematical properties.

To handle big data, shrink it - algorithm reduces size of data sets while preserving their mathematical properties.
Bookmark and Share

As anyone who’s ever used a spreadsheet can attest, it’s often convenient to organize data into tables. But in the age of big data, those tables can be enormous, with millions or even hundreds of millions of rows.

One way to make big-data analysis computationally practical is to reduce the size of data tables — or matrices, to use the mathematical term — by leaving out a bunch of rows. The trick is that the remaining rows have to be in some sense representative of the ones that were omitted, in order for computations performed on them to yield approximately the right results.

At the ACM Symposium on Theory of Computing in June, CSAIL researchers will present a new algorithm that finds the smallest possible approximation of the original matrix that guarantees reliable computations. For a class of problems important in engineering and machine learning, this is a significant improvement over previous techniques. And for all classes of problems, the algorithm finds the approximation as quickly as possible.

Read more at MIT News: http://newsoffice.mit.edu/2015/algorithm-shrinks-big-data-0520