Big Spatial Data Processing With Apache Spark

被引:0
作者
Boyi Shangguan [1 ]
Peng Yue [1 ]
Wu, Zhaoyan [1 ]
Jiang, Liangcun [2 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, 129 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China
[2] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, 129 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China
来源
2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS | 2017年
基金
中国国家自然科学基金;
关键词
Big Spatial Data; Apache Spark; SpatialRDD; SparkSpatialSDK; MAPREDUCE; SYSTEM;
D O I
暂无
中图分类号
S [农业科学];
学科分类号
09 ;
摘要
Big data technologies have shown great promise for managing geospatial data in recent years. In order to deal with the growing spatial data, a high performance spatial data processing system layered on big data technologies is needed. In this paper, we present an approach to process big spatial data with Apache Spark, a fast and generic engine for large-scale data processing. We developed a software development kit named SparkSpatialSDK, which takes spatial characteristics of geospatial data into consideration and provides a Spark-enabled spatial data structure and API to allow users easily perform spatial analyses with big spatial data. The spatial data structure couples geometric data structure (point, line, and polygon) with Resilient Distributed Datasets (RDD). An interface, called SpatialRDD, is provided to access big spatial data stored in distributed database systems like HBase and load the data in Spark processing engine. We illustrates the applications of the API using some example processing functions such as the spatial range and spatial k-nearest neighbor queries. The results demonstrate the applicability of using SparkSpatialSDK for big geospatial data processing.
引用
收藏
页码:239 / 242
页数:4
相关论文
共 14 条
[1]   Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [J].
Aji, Ablimit ;
Wang, Fusheng ;
Vo, Hoang ;
Lee, Rubao ;
Liu, Qiaoling ;
Zhang, Xiaodong ;
Saltz, Joel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1009-1020
[2]  
[Anonymous], 2010, HOTCLOUD
[3]  
[Anonymous], 2012, NSDI
[4]  
[Anonymous], 2012, Hadoop: The definitive guide
[5]  
Balkic Zoran, 2012, Agent and Multi-Agent Systems. Technologies and Applications. Proceedings 6th KES International Conference, KES-AMSTA 2012, P290, DOI 10.1007/978-3-642-30947-2_33
[6]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[7]  
Eldawy Ahmed, 2015, 2015 IEEE 31st International Conference on Data Engineering (ICDE), P1352, DOI 10.1109/ICDE.2015.7113382
[8]  
George Lars, 2011, HBase: the Definitive Guide: Random Access to Your Planet-Size Data
[9]  
LI D, 1999, P INT S DIG EARTH BE, P483
[10]   LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data [J].
Tang, Mingjie ;
Yu, Yongyang ;
Malluhi, Qutaibah M. ;
Ouzzani, Mourad ;
Aref, Walid G. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13) :1565-1568