Distributed In Situ Processing of Big Raster Data in the Cloud

被引:4
作者
Zalipynis, Ramon Antonio Rodriges [1 ]
机构
[1] Natl Res Univ, Higher Sch Econ, Moscow, Russia
来源
PERSPECTIVES OF SYSTEM INFORMATICS, PSI 2017 | 2018年 / 10742卷
基金
俄罗斯基础研究基金会;
关键词
Big raster data; Climate reanalysis; Distributed systems; Cloud computing; SciDB; Array DBMS; In situ; NetCDF operators;
D O I
10.1007/978-3-319-74313-4_24
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A raster is the primary data type in Earth science, geology, remote sensing and other fields with tremendous growth of data volumes. An array DBMS is an option to tackle big raster data processing. However, raster data are traditionally stored in files, not in databases. Command line tools have long being developed to process raster files. Most tools are feature-rich and free but optimized for a single machine. This paper proposes new techniques for distributed processing of raster data directly in diverse file formats by delegating considerable portions of work to such tools. An N-dimensional array data model is proposed to maintain independence from the files and the tools. Also, a new scheme named GROUP-APPLY-FINALLY is presented to universally express the majority of raster data processing operations and streamline their distributed execution. New approaches make it possible to provide a rich collection of raster operations at scale and outperform SciDB over 410x on average on climate reanalysis data. SciDB is the only freely available distributed array DBMS to date. Experiments were carried out on 8- and 16-node clusters in Microsoft Azure Cloud.
引用
收藏
页码:337 / 351
页数:15
相关论文
共 15 条
  • [1] [Anonymous], 2011, INF CYB COMP EN
  • [2] Baumann Peter, 2013, Advances in Spatial and Temporal Databases. 13th International Symposium, SSTD 2013. Proceedings. LNCS 8098, P478, DOI 10.1007/978-3-642-40235-7_32
  • [3] Baumann P., 2012, INT J DATABASE THEOR, V5, P89
  • [4] Blanas S., 2014, ACM SIGMOD
  • [5] Buck J.B., 2011, SC
  • [6] A Demonstration of SciDB: A Science-Oriented DBMS
    Cudre-Mauroux, P.
    Kimura, H.
    Lim, K. -T.
    Rogers, J.
    Simakov, R.
    Soroush, E.
    Velikhov, P.
    Wang, D. L.
    Balazinska, M.
    Becla, J.
    DeWitt, D.
    Heath, B.
    Maier, D.
    Madden, S.
    Patel, J.
    Stonebraker, M.
    Zdonik, S.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1534 - 1537
  • [7] Grawinkel M., 2015, 13 USENIX C FIL STO, P83
  • [8] Unidata's Common Data Model mapping to the ISO 19123 Data Model
    Nativi, Stefano
    Caron, John
    Domenico, Ben
    Bigagli, Lorenzo
    [J]. EARTH SCIENCE INFORMATICS, 2008, 1 (02) : 59 - 78
  • [9] Papadopoulos S, 2016, PROC VLDB ENDOW, V10, P349
  • [10] NETCDF - AN INTERFACE FOR SCIENTIFIC-DATA ACCESS
    REW, R
    DAVIS, G
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 1990, 10 (04) : 76 - 82