A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

被引:6
作者
Garcia-Garcia, Francisco [1 ]
Corral, Antonio [1 ]
Iribarne, Luis [1 ]
Mavrommatis, George [2 ]
Vassilakopoulos, Michael [2 ]
机构
[1] Univ Almeria, Dept Informat, Almeria, Spain
[2] Univ Thessaly, DaSE Lab, Dept Elect & Comp Engn, Volos, Greece
来源
ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017 | 2017年 / 10509卷
关键词
Spatial data processing; Distance joins; SpatialHadoop; LocationSpark; ALGORITHMS;
D O I
10.1007/978-3-319-66917-5_15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the e Distance Join Query (eDJQ). The KCPQ finds the K closest pairs of points from two datasets and the eDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold e of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time.
引用
收藏
页码:214 / 228
页数:15
相关论文
共 18 条
  • [1] Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
    Aji, Ablimit
    Wang, Fusheng
    Vo, Hoang
    Lee, Rubao
    Liu, Qiaoling
    Zhang, Xiaodong
    Saltz, Joel
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1009 - 1020
  • [2] Algorithms for processing K-closest-pair queries in spatial databases
    Corral, A
    Manolopoulos, Y
    Theodoridis, Y
    Vassilakopoulos, M
    [J]. DATA & KNOWLEDGE ENGINEERING, 2004, 49 (01) : 67 - 104
  • [3] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [4] Eldawy Ahmed, 2015, 2015 IEEE 31st International Conference on Data Engineering (ICDE), P1352, DOI 10.1109/ICDE.2015.7113382
  • [5] Eldawy A, 2015, PROC VLDB ENDOW, V8, P1602
  • [6] Garcia-Garcia Francisco, 2016, Advances in Databases and Information Systems. 20th East European Conference, ADBIS 2016. Proceedings: LNCS 9809, P212, DOI 10.1007/978-3-319-44039-2_15
  • [7] Lenka R. K., 2016, CORR
  • [8] Distributed Data Management Using MapReduce
    Li, Feng
    Ooi, Beng Chin
    Oezsu, M. Tamer
    Wu, Sai
    [J]. ACM COMPUTING SURVEYS, 2014, 46 (03)
  • [9] New plane-sweep algorithms for distance-based join queries in spatial databases
    Roumelis, George
    Corral, Antonio
    Vassilakopoulos, Michael
    Manolopoulos, Yannis
    [J]. GEOINFORMATICA, 2016, 20 (04) : 571 - 628
  • [10] Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
    Shi, Juwei
    Qiu, Yunjie
    Minhas, Umar Farooq
    Jiao, Limei
    Wang, Chen
    Reinwald, Berthold
    Ozcan, Fatma
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13): : 2110 - 2121