Top-k spatial distance joins

被引:1
作者
Qi, Shuyao [1 ]
Bouros, Panagiotis [2 ]
Mamoulis, Nikos [3 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Johannes Gutenberg Univ Mainz, Inst Comp Sci, Mainz, Germany
[3] Univ Ioannina, Dept Comp Sci & Engn, Ioannina, Greece
关键词
Top-k join; Spatial join; QUERIES;
D O I
10.1007/s10707-020-00393-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Top-k joins have been extensively studied when numerical valued attributes are joined on an equality predicate. Other types of join attributes and predicates have received little to no attention. In this paper, we consider spatial objects that are assigned a score (e.g., a ranking). Give two collections R, S of such objects and a spatial distance threshold & x1d716;& xdf16;, we introduce the top-k spatial distance join (k-SDJoin) to identify the k pairs of objects, which have the highest combined score (based on an aggregate function gamma) among all object pairs in R x S with a spatial distance at most & x1d716;& xdf16;. State-the-of-art methods for relational top-k joins can be adapted for k-SDJoin, but their focus is on minimizing the number of objects accessed from the inputs; however, when spatial objects are joined, the computational cost can easily become the bottleneck. In view of this, we propose a novel evaluation algorithm, which greatly reduces the computational cost, without compromising the access cost. The main idea is to access and efficiently join blocks of objects from each collection, using appropriate bounds to avoid computing the entire spatial & x1d716;& xdf16;-distance join. As the performance of our solution heavily relies on the size of the input blocks, we devise an approach for automated block size tuning enhanced by a novel generic model for estimating the number of objects to be accessed from each input. Contrary to previous efforts, our model employs cheap-to-compute statistics and requires no prior knowledge of data distribution. Our extensive experimental analysis demonstrates the efficiency of our algorithm compared to methods based on existing literature that prioritize either the ranking or the spatial join component of k-SDJoin queries.
引用
收藏
页码:591 / 631
页数:41
相关论文
共 46 条
[1]  
[Anonymous], 1996, ACM SIGMOD RECORD
[2]  
Arge L., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P570
[3]  
Belussi A., 1995, VLDB '95. Proceedings of the 21st International Conference on Very Large Data Bases, P299
[4]   Efficient processing of spatial joins using R-trees [J].
Brinkhoff, Thomas ;
Kriegel, Hans-Peter ;
Seeger, Bernhard .
SIGMOD Record, 1993, 22 (02) :237-246
[5]  
Chakrabarti K, 2011, PROC INT CONF DATA, P709, DOI 10.1109/ICDE.2011.5767855
[6]   Buffer queries [J].
Chan, EPF .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) :895-910
[7]  
Corral A, 2000, SIGMOD REC, V29, P189, DOI 10.1145/335191.335414
[8]   Processing of Rank Joins in Highly Distributed Systems [J].
Doulkeridis, Christos ;
Vlachou, Akrivi ;
Norvag, Kjetil ;
Kotidis, Yannis ;
Polyzotis, Neoklis .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :606-617
[9]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[10]  
Faloutsos C, 2000, SIGMOD REC, V29, P177, DOI 10.1145/335191.335412