Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data

被引：32

作者：

Li, Zhenlong ^{[1
]}

Yang, Chaowei ^{[2
]}

Liu, Kai ^{[2
]}

Hu, Fei ^{[2
]}

Jin, Baoxuan ^{[3
]}

机构：

[1] Univ South Carolina, Dept Geog, Columbia, SC 29208 USA

[2] George Mason Univ, Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA

[3] Yunnan Prov Geomat Ctr, Kunming 650034, Peoples R China

来源：

ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION | 2016年 / 5卷 / 10期

基金：

美国国家科学基金会;

关键词：

geoprocessing; cloud computing; big data; geospatial cyberinfrastructure; Hadoop; CYBERINFRASTRUCTURE; MAPREDUCE; FRAMEWORK; GIS;

D O I：

10.3390/ijgi5100173

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Efficient processing of big geospatial data is crucial for tackling global and regional challenges such as climate change and natural disasters, but it is challenging not only due to the massive data volume but also due to the intrinsic complexity and high dimensions of the geospatial datasets. While traditional computing infrastructure does not scale well with the rapidly increasing data volume, Hadoop has attracted increasing attention in geoscience communities for handling big geospatial data. Recently, many studies were carried out to investigate adopting Hadoop for processing big geospatial data, but how to adjust the computing resources to efficiently handle the dynamic geoprocessing workload was barely explored. To bridge this gap, we propose a novel framework to automatically scale the Hadoop cluster in the cloud environment to allocate the right amount of computing resources based on the dynamic geoprocessing workload. The framework and auto-scaling algorithms are introduced, and a prototype system was developed to demonstrate the feasibility and efficiency of the proposed scaling mechanism using Digital Elevation Model (DEM) interpolation as an example. Experimental results show that this auto-scaling framework could (1) significantly reduce the computing resource utilization (by 80% in our example) while delivering similar performance as a full-powered cluster; and (2) effectively handle the spike processing workload by automatically increasing the computing resources to ensure the processing is finished within an acceptable time. Such an auto-scaling approach provides a valuable reference to optimize the performance of geospatial applications to address data- and computational-intensity challenges in GIScience in a more cost-efficient manner.

引用

页数：14

共 39 条

[1] Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit
Wang, Fusheng
Vo, Hoang
Lee, Rubao
Liu, Qiaoling
Zhang, Xiaodong
Saltz, Joel
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1009 - 1020
[2] [Anonymous], 2011, P 14 INT C EXT DAT T
[3] [Anonymous], COMPUT ENV URBAN SYS
[4] [Anonymous], P INT C INF TECHN CO
[5] [Anonymous], COMPUT ENV URBAN SYS
[6] [Anonymous], COMPUT ENV URBAN SYS
[7] [Anonymous], J COMPUT SCI INDIANA
[8] [Anonymous], THESIS
[9] [Anonymous], RETRIEVING INDEXING
[10] [Anonymous], IEEE COMPUT SOC

← 1 2 3 4 →