Approximate Nearest Neighbor Search algorithms for web-scale search and recommendation

Speaker

Microsoft Research India

Host

Noah Golowich

MIT

Abstract:
Web-scale search and recommendation scenarios increasingly use Approximate Nearest Neighbor Search (ANNS) algorithms to index and retrieve objects based on the similarity of their learnt representations in a geometric space. Since these scenarios often span billions or trillions of objects, efficient and scalable ANNS algorithms are critical to making these systems practical. However, most algorithms studied in literature either focus on million-scale datasets or do not have features necessary for practical indices, e.g., support for real-time updates.

In this talk we discuss empirical progress on this problem. Specifically, we present DiskANN, the first external memory ANNS algorithm that can index a billion points and serve queries at interactive latencies (few milliseconds) on a commodity machine. This represents an order of magnitude more points indexed per machine than previous work. In addition, the index allows real-time updates and its in-memory performance compares well with other state of the art indices.

We will conclude with some open problems in this space -- e.g., support for hybrid queries that involve a combination of similarity search and hard matches such as language or author -- and some preliminary results. Further, proving any reasonable bounds on the complexity of DiskANN or related graph-based ANNS indices remains an open problem.

Joint work with Ravishankar Krishnaswamy, Sujas J Subramanya, Aditi Singh, Rohan Kadekodi, Devvrit, Shikhar Jaiswal, Magdalen Dobson, Siddharth Gollapudi, Neel Karia, Varun Sivasankaran.

Add to Calendar 2022-09-14 16:00:00 2022-09-14 17:00:00 America/New_York Approximate Nearest Neighbor Search algorithms for web-scale search and recommendation Abstract:Web-scale search and recommendation scenarios increasingly use Approximate Nearest Neighbor Search (ANNS) algorithms to index and retrieve objects based on the similarity of their learnt representations in a geometric space. Since these scenarios often span billions or trillions of objects, efficient and scalable ANNS algorithms are critical to making these systems practical. However, most algorithms studied in literature either focus on million-scale datasets or do not have features necessary for practical indices, e.g., support for real-time updates. In this talk we discuss empirical progress on this problem. Specifically, we present DiskANN, the first external memory ANNS algorithm that can index a billion points and serve queries at interactive latencies (few milliseconds) on a commodity machine. This represents an order of magnitude more points indexed per machine than previous work. In addition, the index allows real-time updates and its in-memory performance compares well with other state of the art indices. We will conclude with some open problems in this space -- e.g., support for hybrid queries that involve a combination of similarity search and hard matches such as language or author -- and some preliminary results. Further, proving any reasonable bounds on the complexity of DiskANN or related graph-based ANNS indices remains an open problem. Joint work with Ravishankar Krishnaswamy, Sujas J Subramanya, Aditi Singh, Rohan Kadekodi, Devvrit, Shikhar Jaiswal, Magdalen Dobson, Siddharth Gollapudi, Neel Karia, Varun Sivasankaran. 32-G575

Organizer & Contact

Noah Golowich

nzg@csail.mit.edu

Approximate Nearest Neighbor Search algorithms for web-scale search and recommendation

Speaker

Host

September 14 2022

Location

Organizer & Contact

May 15

Verifiable PIR with Small Client Storage

May 01

OPTIKS: An Optimized Key Transparency System

Approximate Nearest Neighbor Search algorithms for web-scale search and recommendation

Speaker

Host

September 14 2022

Location

Organizer & Contact

Related Events

May 15

Verifiable PIR with Small Client Storage

May 01

OPTIKS: An Optimized Key Transparency System