ST-Hadoop: a MapReduce framework for spatio-temporal data

被引:53
作者
Alarabi, Louai [1 ]
Mokbel, Mohamed F. [1 ]
Musleh, Mashaal [1 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
MapReduce-based systems; Spatio-temporal systems; Spatio-temporal range query; Spatio-temporal nearest neighbor query; Spatio-temporal join query;
D O I
10.1007/s10707-018-0325-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents ST-Hadoop; the first full-fledged open-source MapReduce framework with a native support for spatio-temporal data. ST-Hadoop is a comprehensive extension to Hadoop and SpatialHadoop that injects spatio-temporal data awareness inside each of their layers, mainly, language, indexing, and operations layers. In the language layer, ST-Hadoop provides built in spatio-temporal data types and operations. In the indexing layer, ST-Hadoop spatiotemporally loads and divides data across computation nodes in Hadoop Distributed File System in a way that mimics spatio-temporal index structures, which result in achieving orders of magnitude better performance than Hadoop and SpatialHadoop when dealing with spatio-temporal data and queries. In the operations layer, ST-Hadoop shipped with support for three fundamental spatio-temporal queries, namely, spatio-temporal range, top-k nearest neighbor, and join queries. Extensibility of ST-Hadoop allows others to extend features and operations easily using similar approaches described in the paper. Extensive experiments conducted on large-scale dataset of size 10 TB that contains over 1 Billion spatio-temporal records, to show that ST-Hadoop achieves orders of magnitude better performance than Hadoop and SpaitalHadoop when dealing with spatio-temporal data and operations. The key idea behind the performance gained in ST-Hadoop is its ability in indexing spatio-temporal data within Hadoop Distributed File System.
引用
收藏
页码:785 / 813
页数:29
相关论文
共 27 条
[1]  
Aji A., 2013, VLDB
[2]  
Al-Naami KM, 2014, CLOUDCOM
[3]  
Alarabi L, 2017, SSTD
[4]  
Eldawy A., 2015, ICDE
[5]  
Eldawy A., 2014, ICDE
[6]  
Erwig M, 2002, TKDE
[7]  
Fox AD, 2013, BIGDATA
[8]  
Fries S., 2014, ICDE
[9]  
Han W, 2009, COST BASED PREDICTIV
[10]  
Kini Ameet., 2014, Geotrellis: Adding Geospatial Capabilities to Spark