Processing of Rank Joins in Highly Distributed Systems

被引:5
作者
Doulkeridis, Christos [1 ]
Vlachou, Akrivi [1 ]
Norvag, Kjetil [1 ]
Kotidis, Yannis [2 ]
Polyzotis, Neoklis [3 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Trondheim, Norway
[2] AUEB, Athens, Greece
[3] Univ Calif Santa Cruz, Santa Cruz, CA USA
来源
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE) | 2012年
关键词
D O I
10.1109/ICDE.2012.108
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we study efficient processing of rank joins in highly distributed systems, where servers store fragments of relations in an autonomous manner. Existing rank join algorithms exhibit poor performance in this setting due to excessive communication costs or high latency. We propose a novel distributed rank-join framework that determines the subset of each relational fragment that needs to be fetched to generate the top-k join results. At the heart of our framework lies a score bound estimation algorithm that utilizes statistics to produce sufficient score bounds for each relation, which guarantee the correctness of the rank join result set. Furthermore, we propose a generalization of our framework that supports approximate statistics, in the case that the exact statistical information is not available. In addition, we demonstrate how our algorithms can utilize distributed statistics to process top-k join queries without any modifications. An extensive experimental study validates the efficiency of our framework and demonstrates its advantages over existing methods.
引用
收藏
页码:606 / 617
页数:12
相关论文
共 26 条
[1]  
Aboulnaga A, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P181, DOI 10.1145/304181.304198
[2]  
AKBARINIA R., 2007, Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA, P211
[3]  
[Anonymous], 2000, ICDE
[4]  
[Anonymous], 2008, P 11 INT C EXT DAT T
[5]  
[Anonymous], 1999, IEEE Data Eng. Bull.
[6]  
Arai B., 2010, P VLDB
[7]   Top-k selection queries over relational databases:: Mapping strategies and performance evaluation [J].
Bruno, N ;
Chaudhuri, S ;
Gravano, L .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :153-187
[8]  
Chen CM, 2002, PROC INT CONF DATA, P617, DOI 10.1109/ICDE.2002.994779
[9]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[10]  
Godfrey B., 2004, P INFOCOM