Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

被引:14
作者
Hu, Fei [1 ,2 ]
Xu, Mengchao [1 ,2 ]
Yang, Jingchao [1 ,2 ]
Liang, Yanshou [1 ,2 ]
Cui, Kejin [1 ,2 ]
Little, Michael M. [3 ]
Lynnes, Christopher S. [3 ]
Duffy, Daniel Q. [3 ]
Yang, Chaowei [1 ,2 ]
机构
[1] George Mason Univ, NSF Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA
[2] George Mason Univ, Dept Geog & GeoInformat Sci, Fairfax, VA 22030 USA
[3] NASA, Goddard Space Flight Ctr, Greenbelt, MD 20771 USA
来源
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION | 2018年 / 7卷 / 04期
基金
美国国家科学基金会;
关键词
big data; data container; geospatial raster data management; GIS; SYSTEM; PERFORMANCE;
D O I
10.3390/ijgi7040144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.
引用
收藏
页数:22
相关论文
共 42 条
  • [11] Brown P.G., 2010, P 2010 ACM SIGMOD IN, P963, DOI DOI 10.1145/1807167.1807271
  • [12] Camara Gilberto, 2016, P 5 ACM SIGSPATIAL I, P1, DOI DOI 10.1145/3006386.3006393
  • [13] Implementation of Multidimensional Databases with Document-Oriented NoSQL
    Chevalier, M.
    El Malki, M.
    Kopliku, A.
    Teste, O.
    Tournier, R.
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 379 - 390
  • [14] DATABASE STRUCTURE AND MANIPULATION CAPABILITIES OF A PICTURE DATABASE-MANAGEMENT SYSTEM (PICDMS)
    CHOCK, M
    CARDENAS, AF
    KLINGER, A
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (04) : 484 - 492
  • [15] A Demonstration of SciDB: A Science-Oriented DBMS
    Cudre-Mauroux, P.
    Kimura, H.
    Lim, K. -T.
    Rogers, J.
    Simakov, R.
    Soroush, E.
    Velikhov, P.
    Wang, D. L.
    Balazinska, M.
    Becla, J.
    DeWitt, D.
    Heath, B.
    Maier, D.
    Madden, S.
    Patel, J.
    Stonebraker, M.
    Zdonik, S.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1534 - 1537
  • [16] Das K., 2015, P 2015 AGU FALL M SA
  • [17] Fire Information for Resource Management System: Archiving and Distributing MODIS Active Fire Data
    Davies, Diane K.
    Ilavajhala, Shriram
    Wong, Min Minnie
    Justice, Christopher O.
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2009, 47 (01): : 72 - 79
  • [18] Demchenko Y, 2013, PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON COLLABORATION TECHNOLOGIES AND SYSTEMS (CTS), P48
  • [19] SciHive: Array-based query processing with HiveQL
    Geng, Yifeng
    Huang, Xiaomeng
    Zhu, Meiqi
    Ruan, Huabin
    Yang, Guangwen
    [J]. 2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 887 - 894
  • [20] NoSQL Systems for Big Data Management
    Gudivada, Venkat N.
    Rao, Dhana
    Raghavan, Vijay V.
    [J]. 2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 190 - 197