SIZE BOUNDS AND QUERY PLANS FOR RELATIONAL JOINS

被引:84
作者
Atserias, Albert [1 ]
Grohe, Martin [2 ]
Marx, Daniel [3 ]
机构
[1] UPC, Dept Llenguatges & Sistemes Informac LSI, Barcelona 08034, Spain
[2] Rhein Westfal TH Aachen, Lehrstuhl Informat 7, D-52056 Aachen, Germany
[3] Hungarian Acad Sci MTA SZTAKI, Comp & Automat Res Inst, H-1518 Budapest, Hungary
基金
欧洲研究理事会;
关键词
fractional edge cover; linear programming; join; database query; query plan;
D O I
10.1137/110859440
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Relational joins are at the core of relational algebra, which in turn is the core of the standard database query language SQL. As their evaluation is expensive and very often dominated by the output size, it is an important task for database query optimizers to compute estimates on the size of joins and to find good execution plans for sequences of joins. We study these problems from a theoretical perspective, both in the worst-case model and in an average-case model where the database is chosen according to a known probability distribution. In the former case, our first key observation is that the worst-case size of a query is characterized by the fractional edge cover number of its underlying hypergraph, a combinatorial parameter previously known to provide an upper bound. We complete the picture by proving a matching lower bound and by showing that there exist queries for which the join-project plan suggested by the fractional edge cover approach may be substantially better than any join plan that does not use intermediate projections. On the other hand, we show that in the average-case model, every join-project plan can be turned into a plan containing no projections in such a way that the expected time to evaluate the plan increases only by a constant factor independent of the size of the database. Not surprisingly, the key combinatorial parameter in this context is the maximum density of the underlying hypergraph. We show how to make effective use of this parameter to eliminate the projections.
引用
收藏
页码:1737 / 1767
页数:31
相关论文
共 16 条
[1]  
Abiteboul S., 1995, Foundations of databases, V8
[2]  
Alon Noga, 1992, The Probabilistic Method
[3]  
[Anonymous], 2001, Approximation algorithms
[4]  
Chaudhuri S., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P34, DOI 10.1145/275487.275492
[5]   SOME INTERSECTION-THEOREMS FOR ORDERED SETS AND GRAPHS [J].
CHUNG, FRK ;
GRAHAM, RL ;
FRANKL, P ;
SHEARER, JB .
JOURNAL OF COMBINATORIAL THEORY SERIES A, 1986, 43 (01) :23-37
[6]   Query evaluation via tree-decompositions [J].
Flum, J ;
Frick, M ;
Grohe, M .
JOURNAL OF THE ACM, 2002, 49 (06) :716-752
[7]   On the number of copies of one hypergraph in another [J].
Friedgut, E ;
Kahn, J .
ISRAEL JOURNAL OF MATHEMATICS, 1998, 105 (1) :251-256
[8]  
Garcia-Molina Hector., 1999, Database System Implementation
[9]   Size and Treewidth Bounds for Conjunctive Queries [J].
Gottlob, Georg ;
Lee, Stephanie Tien ;
Valiant, Gregory ;
Valiant, Paul .
JOURNAL OF THE ACM, 2012, 59 (03)
[10]   QUERY EVALUATION TECHNIQUES FOR LARGE DATABASES [J].
GRAEFE, G .
COMPUTING SURVEYS, 1993, 25 (02) :73-170