GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data

被引:187
作者
Yu, Jia [1 ]
Wu, Jinxuan [1 ]
Sarwat, Mohamed [1 ]
机构
[1] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, 699 S Mill Ave, Tempe, AZ 85287 USA
来源
23RD ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2015) | 2015年
关键词
Cluster computing; Large-scale data; Spatial data;
D O I
10.1145/2820783.2820860
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading /storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDDs to support geometrical and spatial objects. GeoSpark provides a geometrical operations library that accesses Spatial RDDs to perform basic geometrical operations (e.g., Overlap, Intersect). System users can leverage the newly defined SRDDs to effectively develop spatial data processing programs in Spark. The Spatial Query Processing Layer efficiently executes spatial query processing algorithms (e.g., Spatial Range, Join, KNN query) on SRDDs. GeoSpark also allows users to create a spatial index (e.g., R-tree, Quad-tree) that boosts spatial data processing performance in each SRDD partition. Preliminary experiments show that GeoSpark achieves better run time performance than its Hadoop-based counterparts (e.g., SpatialHadoop).
引用
收藏
页数:4
相关论文
共 10 条
[1]   Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [J].
Aji, Ablimit ;
Wang, Fusheng ;
Vo, Hoang ;
Lee, Rubao ;
Liu, Qiaoling ;
Zhang, Xiaodong ;
Saltz, Joel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1009-1020
[2]  
[Anonymous], 1998, GEOINFORMATICA, DOI DOI 10.1023/A:1009755931056
[3]   A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data [J].
Eldawy, Ahmed ;
Mokbel, Mohamed F. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12) :1230-1233
[4]  
Guttman A., 1984, SIGMOD
[5]   Parallel Secondo: Boosting Database Engines with Hadoop [J].
Lu, Jiamin ;
Gueting, Ralf Hartmut .
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, :738-743
[6]   A non-blocking parallel spatial join algorithm [J].
Luo, G ;
Naughton, JF ;
Ellmann, CJ .
18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :697-705
[7]  
Nishimura S., 2011, 2011 12th IEEE International Conference on Mobile Data Management (MDM 2011), P7, DOI 10.1109/MDM.2011.41
[8]  
Roussopoulos N., 1995, ACM SIGMOD RECORD, V24, P7179
[9]   THE QUADTREE AND RELATED HIERARCHICAL DATA-STRUCTURES [J].
SAMET, H .
COMPUTING SURVEYS, 1984, 16 (02) :187-260
[10]  
Zaharia M., 2012, 9 USENIX S NETWORKED