Similarity Join over Array Data

被引:21
作者
Zhao, Weijie [1 ]
Rusu, Florin [1 ,2 ]
Dong, Bin [2 ]
Wu, Kesheng [2 ]
机构
[1] UC Merced, Merced, CA 95343 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
来源
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2016年
关键词
D O I
10.1145/2882903.2915247
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks Similarity join is a fundamental operation across scientific workloads that requires complex processing over an unbounded number of pairs of multi-dimensional points. In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. Unlike immediate extensions to array join and relational similarity join, the proposed operator minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. We define formally array similarity join and present the design, optimization strategies, and evaluation of the first array similarity join operator.
引用
收藏
页码:2007 / 2022
页数:16
相关论文
共 48 条
[1]  
Afrati F., ICDE 2012
[2]  
Baumann P., EFFICIENT EVALUATION
[3]  
Baumann P., SIGMOD 1998
[4]  
Blanas S., SIGMOD 2010
[5]  
Bohm C., SIGMOD 2001
[6]  
Brown P., SIGMOD 2010
[7]  
Buck J. B., SC 2011
[8]  
Chaudhuri S., ICDE 2006
[9]  
Cheng Y., 2014, Distributed and Parallel Databases
[10]  
Clarkson K., 1983, IPL, V16