Making databases faster

database

All modern applications -- from mobile phones to the web -- use database systems to store and retrieve data. Database systems are the backbone of virtually all of our modern Information Technology (IT) infrastructure. 

Modern database systems support “Structured Query Language (SQL)”, a programming language which is used to query, process, and manipulate data. SQL is declarative, which means it allows the user to specify what has to be done, rather than how to do it. Then, it’s up to the database system to decide how to best execute the SQL query, where it must decide among thousands of alternative ways to carry out a query. 

A “good” query plan might return an answer in seconds, whereas a “bad” one could run for a month. As a result, many larger database companies spend countless hours and capital to improve their query optimizers. 

Recent efforts in the field have attempted to build query optimizers using neural networks (NN), rather than rely on hand-tuned cost models and rules to translate a SQL query into a “good” query plan. Unfortunately, none of the existing neural net models are practical yet. They take a long amount of time to train, which is a problem if the data or workload changes. The decisions made by a neural net model are also often not interpretable, so many database administrators would find them untrustworthy. 

Researchers out of MIT’s Data Systems and AI Lab (DSAIL) have now devised a new way to improve query optimizers, called “Bao for Banding Optimizer.” Rather than trying to entirely replace the traditional query optimizer using a neural net, the researchers devised a way to build a neural net model which improves the performance of existing optimizers by “steering” them into the right direction. 

“This approach can be more easily integrated into existing systems, and the results become more interpretable, so they can be used as an “advisor”, whereby, instead of replacing the query optimizer, it can be used to give recommendations to a database administrator,” says MIT professor Tim Kraska, the lead advisor on the project.

The researchers tested Bao on various open-source and commercial database systems, and showed that their approach can improve existing optimizers by up to 50 percent, without changing the code of the original database. 

Many database companies have already started to explore how the approach of Bao could help with the performance of their systems. For example, researchers from Microsoft and MIT have explored how the Bao technique could help with their big data workloads, and found that it can improve latency on average by 7-30 percent, and up to 90 percent for non-trivial queries. 

The Bao paper will be presented virtually this week at the 2021 ACM SIGMOD conference, where it also won a best paper award. 

Kraska worked on the project alongside lead author Ryan Marcus (MIT and Intel Labs), Parimarjan (MIT), Hongzi Mao (MIT), Nesime Tatbul (MIT and Intel Labs), and MIT professor Mohammad Alizadeh. 

To learn more about the effort, visit: https://learnedsystems.mit.edu/ and http://dsail.csail.mit.edu/