From Error-Estimating Codes to Approximate Nearest Neighbors Search (ANNS): A Networking Researcher’s Unusual Journey into Database Research
Speaker
Jun Xu
Georgia Institute of Technology
Host
Sam Madden
CSAIL MIT
From Error-Estimating Codes to Approximate Nearest Neighbors Search (ANNS): A Networking Researcher’s Unusual Journey into Database Research
Approximate Nearest Neighbors Search (ANNS), aka similarity search, has been an important research topic for decades, with new and challenging ANNS problems continue to emerge from database and machine learning applications. ANNS problems, classical or new alike, have become increasingly difficult to solve due to the rapid growth of data (set) size and dimension in the past two decades. This talk will summarize my research in the past three years into ANNS. In this research, we have formulated new and exciting ANNS problems (e.g. ANNS after linear transformation, or ANNS-ALT), produced highly scalable ANNS solutions (e.g., ONe Index for All Kernels, or ONIAK), and designed fundamental algorithmic primitives that are the secret sauce that makes our ANNS solutions highly scalable. These primitives are fundamental in that they can be used to solve other (than ANNS) database and machine learning problems. Indeed, so far they have already led to the resolution of three high-profile algorithmic research questions that have been open or “openly hated” for decades: (1) Can we compute an arbitrary multi-dimensional range sum of (ideally i.i.d.) RVs in polylog time and space? (2) Can we perform multi-dimensional data cube queries and updates in polylog time and space? (3) Can we compute the quadratic form x^T Gx (given x as argument) in o(d^2) time, where G is a d*d Gaussian random matrix (fixed after being generated)? Interestingly, the seed of this database research was planted a decade ago, during my pursuit of a wireless networking research problem called error-estimating codes, but embarrassingly it took me nearly a decade to connect the dots.
Jun Xu has been a Professor in the School of Computer Science at Georgia Tech since 2000. He has worked on the design and analysis of network algorithmics for over two decades. He was instrumental in introducing randomization to the design of network algorithmics. His network algorithmics research jointly with his former students has garnered Best Student Paper Awards in conferences such as ACM Sigmetrics. In the past three years, he has ventured into an emerging research field called big data algorithmics that includes topics such as locality sensitive hashing (LSH) techniques for supporting similarity search in database and machine learning applications, and follows the same guiding principle of network algorithmics: combining algorithmic thinking with systems thinking. He has been an ACM Distinguished Scientist since 2010.
Tianyu Li is inviting you to a scheduled Zoom meeting.
Topic: From Error-Estimating Codes to Approximate Nearest Neighbors Search (ANNS): A Networking Researcher’s Unusual Journey into Database Research
Time: Mar 20, 2023 03:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://mit.zoom.us/j/93410834157?pwd=d1ZyUjNuNUp3cW40aEVXcmMrR2NoZz09
Password: 652225
One tap mobile
+16465588656,,93410834157# US (New York)
+16699006833,,93410834157# US (San Jose)
Meeting ID: 934 1083 4157
US : +1 646 558 8656 or +1 669 900 6833
International Numbers: https://mit.zoom.us/u/achJstuX6c
Join by SIP
93410834157@zoomcrc.com
Join by Skype for Business
https://mit.zoom.us/skype/93410834157
Approximate Nearest Neighbors Search (ANNS), aka similarity search, has been an important research topic for decades, with new and challenging ANNS problems continue to emerge from database and machine learning applications. ANNS problems, classical or new alike, have become increasingly difficult to solve due to the rapid growth of data (set) size and dimension in the past two decades. This talk will summarize my research in the past three years into ANNS. In this research, we have formulated new and exciting ANNS problems (e.g. ANNS after linear transformation, or ANNS-ALT), produced highly scalable ANNS solutions (e.g., ONe Index for All Kernels, or ONIAK), and designed fundamental algorithmic primitives that are the secret sauce that makes our ANNS solutions highly scalable. These primitives are fundamental in that they can be used to solve other (than ANNS) database and machine learning problems. Indeed, so far they have already led to the resolution of three high-profile algorithmic research questions that have been open or “openly hated” for decades: (1) Can we compute an arbitrary multi-dimensional range sum of (ideally i.i.d.) RVs in polylog time and space? (2) Can we perform multi-dimensional data cube queries and updates in polylog time and space? (3) Can we compute the quadratic form x^T Gx (given x as argument) in o(d^2) time, where G is a d*d Gaussian random matrix (fixed after being generated)? Interestingly, the seed of this database research was planted a decade ago, during my pursuit of a wireless networking research problem called error-estimating codes, but embarrassingly it took me nearly a decade to connect the dots.
Jun Xu has been a Professor in the School of Computer Science at Georgia Tech since 2000. He has worked on the design and analysis of network algorithmics for over two decades. He was instrumental in introducing randomization to the design of network algorithmics. His network algorithmics research jointly with his former students has garnered Best Student Paper Awards in conferences such as ACM Sigmetrics. In the past three years, he has ventured into an emerging research field called big data algorithmics that includes topics such as locality sensitive hashing (LSH) techniques for supporting similarity search in database and machine learning applications, and follows the same guiding principle of network algorithmics: combining algorithmic thinking with systems thinking. He has been an ACM Distinguished Scientist since 2010.
Tianyu Li is inviting you to a scheduled Zoom meeting.
Topic: From Error-Estimating Codes to Approximate Nearest Neighbors Search (ANNS): A Networking Researcher’s Unusual Journey into Database Research
Time: Mar 20, 2023 03:00 PM Eastern Time (US and Canada)
Join Zoom Meeting
https://mit.zoom.us/j/93410834157?pwd=d1ZyUjNuNUp3cW40aEVXcmMrR2NoZz09
Password: 652225
One tap mobile
+16465588656,,93410834157# US (New York)
+16699006833,,93410834157# US (San Jose)
Meeting ID: 934 1083 4157
US : +1 646 558 8656 or +1 669 900 6833
International Numbers: https://mit.zoom.us/u/achJstuX6c
Join by SIP
93410834157@zoomcrc.com
Join by Skype for Business
https://mit.zoom.us/skype/93410834157