Selectivity Estimation of Inequality Joins in Databases

被引:1
|
作者
Repas, Diogo [1 ]
Luo, Zhicheng [1 ]
Schoemans, Maxime [1 ]
Sakr, Mahmoud [1 ,2 ]
机构
[1] Univ libre Bruxelles ULB, Data Sci Lab, B-1050 Brussels, Belgium
[2] Ain Shams Univ, Fac Comp & Informat Sci, Cairo 11566, Egypt
关键词
SQL; query optimization; optimizer statistics;
D O I
10.3390/math11061383
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation based on which the optimizer can select the least expensive plan to execute. While the problem has been known since the mid-1970s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By testing four common database systems: Oracle, SQL-Server, PostgreSQL, and MySQL, we found that the open-source systems PostgreSQL and MySQL lack this estimation. Oracle and SQL-Server make fairly accurate estimations, yet their algorithms are secret. This paper, thus, proposes an algorithm for inequality join selectivity estimation. The proposed algorithm was implemented in PostgreSQL and sent as a patch to be included in the next releases. We compared this implementation with the above DBMS for three different data distributions (uniform, normal, and Zipfian) and showed that our algorithm provides extremely accurate estimations (below 0.1% average error), outperforming the other systems by an order of magnitude.
引用
收藏
页数:18
相关论文
共 23 条
  • [1] Query size estimation for joins using systematic sampling
    Ngu, AHH
    Harangsri, B
    Shepherd, J
    DISTRIBUTED AND PARALLEL DATABASES, 2004, 15 (03) : 237 - 275
  • [2] Query Size Estimation for Joins Using Systematic Sampling
    A.H.H. Ngu
    B. Harangsri
    J. Shepherd
    Distributed and Parallel Databases, 2004, 15 : 237 - 275
  • [3] A hybrid estimator for selectivity estimation
    Ling, YB
    Sun, W
    Rishe, ND
    Xiang, XJ
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (02) : 338 - 354
  • [4] Selectivity Estimation of Correlated Properties in RDF Data for SPARQL Query Optimization
    Lv, Bin
    Du, Xiaoyong
    Wang, Yan
    2009 FIFTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRID (SKG 2009), 2009, : 176 - 183
  • [5] Spatial Selectivity Estimation for Web Searching
    Patroumpas, Kostas
    WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS (W2GIS 2015), 2015, 9080 : 107 - 123
  • [6] Piecewise linear histograms for selectivity estimation
    Yu, XH
    Fu, A
    ISE'2001: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON INFORMATION SYSTEMS AND ENGINEERING, 2001, : 319 - 325
  • [7] Estimating nested selectivity in object-oriented and object-relational databases
    Cho, Wan-Sup
    Hong, Ki-Hyung
    Loh, Woong-Kee
    INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (07) : 806 - 816
  • [8] Efficiently adapting graphical models for selectivity estimation
    Kostas Tzoumas
    Amol Deshpande
    Christian S. Jensen
    The VLDB Journal, 2013, 22 : 3 - 27
  • [9] Efficiently adapting graphical models for selectivity estimation
    Tzoumas, Kostas
    Deshpande, Amol
    Jensen, Christian S.
    VLDB JOURNAL, 2013, 22 (01) : 3 - 27
  • [10] Node and edge selectivity estimation for range queries in spatial networks
    Tiakas, E.
    Papadopoulos, A. N.
    Nanopoulos, A.
    Manolopoulos, Y.
    INFORMATION SYSTEMS, 2009, 34 (03) : 328 - 352