Constrained Skyline Query Processing against Distributed Data Sites

被引:28
作者
Chen, Lijiang [1 ,2 ]
Cui, Bin [1 ,2 ]
Lu, Hua [3 ]
机构
[1] Peking Univ, Key Lab High Confidence Software Technol, Minist Educ, Beijing 100871, Peoples R China
[2] Peking Univ, Dept Comp Sci, Beijing 100871, Peoples R China
[3] Aalborg Univ, Dept Comp Sci, DK-9220 Aalborg, Denmark
基金
中国国家自然科学基金;
关键词
Constrained skyline query; filtering point; distributed query processing;
D O I
10.1109/TKDE.2010.103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper, we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result. We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics are proposed to guide the selection of a given number of filtering points from a superset. A cost-efficient model is developed to determine how many filtering points to use for a particular data site. The results of an extensive experimental study demonstrate that our proposals are effective and efficient.
引用
收藏
页码:204 / 217
页数:14
相关论文
共 21 条
[11]  
Kian-Lee Tan, 2001, Proceedings of the 27th International Conference on Very Large Data Bases, P301
[12]  
Kossmann D., 2002, Proceedings of the Twenty-eighth International Conference on Very Large Data Bases, P275
[13]  
Lin X, 2007, 2007 IEEE INTERNATIONAL CONFERENCE ON PORTABLE INFORMATION DEVICES, P87
[14]  
Lu Y, 2008, LECT NOTES COMPUT SC, V5181, P241
[15]  
Papadias D., 2003, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, P467, DOI DOI 10.1145/872757.872814
[16]   A scalable Content-Addressable Network [J].
Ratnasamy, S ;
Francis, P ;
Handley, M ;
Karp, R ;
Shenker, S .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2001, 31 (04) :161-172
[17]  
Tao YF, 2009, PROC INT CONF DATA, P892, DOI 10.1109/ICDE.2009.84
[18]  
Wang Shiyuan., 2007, ICDE, P1126
[19]  
Wu P, 2006, LECT NOTES COMPUT SC, V3896, P112
[20]  
Zhang LP, 2009, MODELLING SIMULATION, P509