Cardinality Estimation using Lp norms on Degree Sequences

Speaker

Dan Olteanu

University of Zurich

Host

Nesime Tatbul

Abstract

Cardinality estimation is the problem of estimating the size of the output of a query, without actually evaluating the query. The cardinality estimator is a critical piece of any query optimizer, and is often the main culprit when the optimizer chooses a poor plan.

In this talk, I will introduce LpBound, a pessimistic cardinality estimator that computes a guaranteed upper bound on the size of the query output. Besides common statistics used for cardinality estimation, such as the sizes of the input relations, the domain sizes for various columns, histograms, and the most common key values in join columns, LpBound also uses Lp norms on the degree sequences of the join columns. The computed cardinality bound is an optimal solution of a linear program whose constraints encode such statistics and Shannon inequalities.

I will also report on an on-going experimental evaluation of LpBound for a variety of queries from the JOB, STATS, and subgraph matching benchmarks: acyclic and cyclic queries, full and group-by queries, and queries with and without range and equality predicates. LpBound takes time and uses memory very close to the traditional estimators used by PostgreSQL, DuckDB, and a commercial database system. Yet the range of LpBound overestimates is much smaller than the range of the estimates of the traditional estimators, which can both underestimate and overestimate by orders of magnitude. When fed the LpBound cardinalities, PostgreSQL can take less query evaluation time than when using its own estimates.

This is joint work with Mahmoud Abo Khamis (RelationalAI), Chris Mayer (U Zurich), Dan Suciu (U Washington), and Haozhe Zhang (U Zurich).

Bio

Dan Olteanu is a professor at the University of Zurich, where he leads the Data Systems and Theory group (https://www.ifi.uzh.ch/en/dast.html), and a computer scientist at RelationalAI (https://relational.ai), where he works on incremental view maintenance, cardinality estimation, in-database machine learning, and data science projects.

Add to Calendar 2024-11-14 13:00:00 2024-11-14 14:00:00 America/New_York Cardinality Estimation using Lp norms on Degree Sequences AbstractCardinality estimation is the problem of estimating the size of the output of a query, without actually evaluating the query. The cardinality estimator is a critical piece of any query optimizer, and is often the main culprit when the optimizer chooses a poor plan.In this talk, I will introduce LpBound, a pessimistic cardinality estimator that computes a guaranteed upper bound on the size of the query output. Besides common statistics used for cardinality estimation, such as the sizes of the input relations, the domain sizes for various columns, histograms, and the most common key values in join columns, LpBound also uses Lp norms on the degree sequences of the join columns. The computed cardinality bound is an optimal solution of a linear program whose constraints encode such statistics and Shannon inequalities.I will also report on an on-going experimental evaluation of LpBound for a variety of queries from the JOB, STATS, and subgraph matching benchmarks: acyclic and cyclic queries, full and group-by queries, and queries with and without range and equality predicates. LpBound takes time and uses memory very close to the traditional estimators used by PostgreSQL, DuckDB, and a commercial database system. Yet the range of LpBound overestimates is much smaller than the range of the estimates of the traditional estimators, which can both underestimate and overestimate by orders of magnitude. When fed the LpBound cardinalities, PostgreSQL can take less query evaluation time than when using its own estimates.This is joint work with Mahmoud Abo Khamis (RelationalAI), Chris Mayer (U Zurich), Dan Suciu (U Washington), and Haozhe Zhang (U Zurich).BioDan Olteanu is a professor at the University of Zurich, where he leads the Data Systems and Theory group (https://www.ifi.uzh.ch/en/dast.html), and a computer scientist at RelationalAI (https://relational.ai), where he works on incremental view maintenance, cardinality estimation, in-database machine learning, and data science projects. Star - 32-D463

Organizer & Contact

Alisa Cabral

Acabral@mit.edu

Cardinality Estimation using Lp norms on Degree Sequences

Speaker

Host

November 14 2024

Location

Organizer & Contact