Efficient Parallel Skyline Query Processing for High-Dimensional Data

被引:13
作者
Tang, Mingjie [1 ]
Yu, Yongyang [1 ]
Aref, Walid G. [2 ,3 ]
Malluhi, Qutaibah M. [4 ]
Ouzzani, Mourad [5 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47906 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Purdue Univ, Ctr Educ & Res Informat Assurance & Secur CERIAS, W Lafayette, IN 47907 USA
[4] Qatar Univ, Dept Comp Sci & Engn, Doha, Qatar
[5] HBKU, Qatar Comp Res Inst, Doha, Qatar
基金
美国国家科学基金会;
关键词
Skyline query; query processing; high-dimensional data; parallel computing; COMPUTATION;
D O I
10.1109/TKDE.2018.2809598
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a set of multidimensional data points, skyline queries retrieve those points that are not dominated by any other points in the set. Due to the ubiquitous se of skyline queries, such as in preference-based query answering and decision making, and the large amount of data that these queries have to deal with, enabling their scalable processing is of critical importance. However, there are several outstanding challenges that have not been well addressed. More specifically, in this paper, we are tackling the data straggler and data skew challenges introduced by distributed skyline query processing, as well as the ensuing high computation cost of merging skyline candidates. We thus introduce a new efficient three-phase approach for large scale processing of skyline queries. In the first preprocessing phase, the data is partitioned along the Z-order curve. We utilize a novel data partitioning approach that formulates data partitioning as an optimization problem to minimize the size of intermediate data. In the second phase, each compute node partitions the input data points into disjoint subsets, and then performs the skyline computation on each subset to produce skyline candidates in parallel. In the final phase, we build an index and employ an efficient algorithm to merge the generated skyline candidates. Extensive experiments demonstrate that the proposed skyline algorithm achieves more than one order of magnitude enhancement in performance compared to existing state-of-the-art approaches.
引用
收藏
页码:1838 / 1851
页数:14
相关论文
共 28 条
[1]   Parallel Skyline Queries [J].
Afrati, Foto N. ;
Koutris, Paraschos ;
Suciu, Dan ;
Ullman, Jeffrey D. .
THEORY OF COMPUTING SYSTEMS, 2015, 57 (04) :1008-1037
[2]  
[Anonymous], CIKM 2006
[3]  
[Anonymous], 2001, Approximation algorithms
[4]  
[Anonymous], 2000, P 26 INT C VER LARG
[5]  
Balke WT, 2004, LECT NOTES COMPUT SC, V2992, P256
[6]   The Skyline operator [J].
Börzsönyi, S ;
Kossmann, D ;
Stocker, K .
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :421-430
[7]  
Catania B., 2012, ADV QUERY PROCESSING, V1
[8]   Skyline with presorting [J].
Chomicki, J ;
Godfrey, P ;
Gryz, J ;
Liang, DM .
19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, :717-719
[9]  
Cosgaya-Lozano Adan., 2007, 21st Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2007), 13-16 May 2007, Saskatoon, Saskatchewan, Canada, page, P12
[10]   A survey of skyline processing in highly distributed environments [J].
Hose, Katja ;
Vlachou, Akrivi .
VLDB JOURNAL, 2012, 21 (03) :359-384