Distributed arrays: an algebra for generic distributed query processing

被引:0
作者
Ralf Hartmut Güting
Thomas Behr
Jan Kristof Nidzwetzki
机构
[1] FernUniversität in Hagen,Faculty of Mathematics and Computer Science
来源
Distributed and Parallel Databases | 2021年 / 39卷
关键词
Distributed database; Distributed query processing; Density-based similarity clustering;
D O I
暂无
中图分类号
学科分类号
摘要
We propose a simple model for distributed query processing based on the concept of a distributed array. Such an array has fields of some data type whose values can be stored on different machines. It offers operations to manipulate all fields in parallel within the distributed algebra. The arrays considered are one-dimensional and just serve to model a partitioned and distributed data set. Distributed arrays rest on a given set of data types and operations called the basic algebra implemented by some piece of software called the basic engine. It provides a complete environment for query processing on a single machine. We assume this environment is extensible by types and operations. Operations on distributed arrays are implemented by one basic engine called the master which controls a set of basic engines called the workers. It maps operations on distributed arrays to the respective operations on their fields executed by workers. The distributed algebra is completely generic: any type or operation added in the extensible basic engine will be immediately available for distributed query processing. To demonstrate the use of the distributed algebra as a language for distributed query processing, we describe a fairly complex algorithm for distributed density-based similarity clustering. The algorithm is a novel contribution by itself. Its complete implementation is shown in terms of the distributed algebra and the basic algebra. As a basic engine the Secondo system is used, a rich environment for extensible query processing, providing useful tools such as main memory M-trees, graphs, or a DBScan implementation.
引用
收藏
页码:1009 / 1064
页数:55
相关论文
共 104 条
[1]  
Alexander A(2014)The Stratosphere platform for big data analytics VLDB J 23 939-964
[2]  
Bergmann R(2014)Asterixdb: a scalable, open source BDMS Proc. VLDB Endow. 7 1905-1916
[3]  
Ewen S(2015)Apache IEEE Data Eng. Bull. 38 28-38
[4]  
Freytag JC(2007): stream and batch processing in a single engine SIGOPS Oper. Syst. Rev. 41 205-220
[5]  
Hueske F(2011)Dynamo: Amazon’s highly available key-value store Proc. VLDB Endow. 4 575-585
[6]  
Heise A(2009)Cohadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endow. 2 1414-1425
[7]  
Kao O(2010)Building a high-level dataflow system on top of map-reduce: the Pig experience IEEE Data Eng. Bull. 33 56-63
[8]  
Leich M(1984)Secondo: a platform for moving objects database research and for publishing and integrating research implementations SIGMOD Record 14 47-57
[9]  
Leser U(2010)R-trees: a dynamic index structure for spatial searching SIGOPS Oper. Syst. Rev. 44 35-40
[10]  
Markl V(2016)Cassandra: a decentralized structured storage system PVLDB 10 157-168