Near-Optimal Hashing Algorithm for the Approximate Nearest Neighbor Problem

Speaker: Piotr Indyk , Massachusetts Institute of Technology
Date: February 14 2006
Time: 4:15PM to 5:15PM
Location: 32-155
Host: Madhu Sudan, Massachusetts Institute of Technology
Contact: Kevin Matulef, 3-5883, matulef@mit.edu
Relevant URL: http://theory.lcs.mit.edu/theory-seminars/The nearest neighbor problem is defined as follows: given a set of n data points in a d-dimensional space, construct a data structure which, given a query point q, returns the data point closest to the query.
The problem is of importance in multiple areas of computer science. Nearest neighbor algorithms can be used for: classification (in machine learning), similarity search (in biological, text or visual databases), data quantization (in signal processing and compression), etc. Most of these applications involve high-dimensional data. Unfortunately, classic algorithms for geometric search problems, such as kd-trees, do not scale well as the dimension increases.
In recent years, there has been extensive research done on *approximate* variants of the nearest neighbor problem. One of the algorithms, based on the idea of Locality-Sensitive Hashing (LSH), has been successfully used in several applied scenarios.
In this talk I will review that work, as well as describe recent developments in this area. In particular, I will show a new LSH-based algorithm, whose running time significantly improves over the earlier bounds. The running time of the new algorithm is provably close to the best possible in the class of hashing-based algorithms.
-----------------------------------
This is joint work with Alex Andoni (MIT).
See other events that are part of Theory Colloquium Spring 2006
See other events happening in February 2006