Similarity Join over Array Data

被引：21

作者：

Zhao, Weijie ^{[1
]}

Rusu, Florin ^{[1
,2
]}

Dong, Bin ^{[2
]}

Wu, Kesheng ^{[2
]}

机构：

[1] UC Merced, Merced, CA 95343 USA

[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA

来源：

SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2016年

关键词：

D O I：

10.1145/2882903.2915247

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Scientific applications are generating an ever-increasing volume of multi-dimensional data that are largely processed inside distributed array databases and frameworks Similarity join is a fundamental operation across scientific workloads that requires complex processing over an unbounded number of pairs of multi-dimensional points. In this paper, we introduce a novel distributed similarity join operator for multi-dimensional arrays. Unlike immediate extensions to array join and relational similarity join, the proposed operator minimizes the overall data transfer and network congestion while providing load-balancing, without completely repartitioning and replicating the input arrays. We define formally array similarity join and present the design, optimization strategies, and evaluation of the first array similarity join operator.

引用

页码：2007 / 2022

页数：16