Databases for AI: The Case for Vector Databases [Zoom Talk]

Speaker

Purdue University

Host

CSAIL

Abstract: Vector databases have recently emerged as a hot topic due to the widespread interest in LLMs, where they provide relevant context that enables LLMs to generate more accurate responses. Current vector databases can be broadly categorized into two types: specialized and integrated. Specialized vector databases are explicitly designed for managing vector data, while integrated vector databases support vector search within existing database systems (mostly relational databases). While specialized vector databases are interesting, there is a significant customer base interested in integrated vector databases for various reasons, such as reluctance to move data out, the desire to link vector embeddings with their source data, and the need for advanced vector search capabilities. However, integrated vector databases face challenges in performance and interoperability. In this talk, I will share our recent experience building integrated vector databases within two relational databases: SingleStore (VLDB'24) and PostgreSQL (CIDR'26). I will show how we address performance and interoperability challenges, resulting in more powerful vector databases that support advanced RAGs. I will also present additional challenges in vector databases and our ongoing research to address them. Finally, I will discuss the broader role of database systems in the era of LLMs and how to build future data infrastructure that extends beyond vector databases to better support AI.

Bio: Jianguo Wang is an Assistant Professor of Computer Science at Purdue University. He received his Ph.D. from the University of California, San Diego. His research focuses on database systems for the Cloud and LLMs, with a particular focus on Disaggregated Databases and Vector Databases. He has worked and interned at Zilliz, Amazon AWS, Microsoft Research, Oracle, and Samsung, contributing to the development of various database systems. He regularly publishes and serves on program committees for premier database conferences, including SIGMOD, VLDB, and ICDE. He also moderated the VLDB 2024 panel on vector databases and was invited to the Dagstuhl Seminar on vector databases. His research has impacted multiple industrial-strength database systems, including Amazon Aurora, Zilliz Milvus, SingleStore, and TigerGraph. His research has been recognized with multiple awards, including the NSF CAREER Award, the ACM SIGMOD Research Highlight Award, the Google ML and Systems Junior Faculty Award, and the IEEE TCDE Rising Star Award.

----

Please reach out to markakis@mit.edu for the Zoom password.