Spatial big data architecture: From Data Warehouses and Data Lakes to the LakeHouse

被引:9
作者
Errami, Soukaina Ait [1 ]
Hajji, Hicham [1 ]
Kadi, Kenza Ait El [1 ]
Badir, Hassan [2 ]
机构
[1] IAV Hassan II Inst, Sch Geomatics & Surveying Engn, Rabat, Morocco
[2] Abdelmalek Essaadi Univ, IDS team, Tangier, Morocco
关键词
Data architecture; Data LakeHouse; Storage; Spatial data; Distributed systems;
D O I
10.1016/j.jpdc.2023.02.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The construction of systems supporting spatial data has experienced great enthusiasm in the past, due to the richness of this type of data and their semantics, which can be used in the decision-making process in various fields. Thus, the problem of integrating spatial data into existing databases and information systems has been addressed by creating spatial extensions to relational tables or by creating spatial data warehouses, while arranging data structures and query languages by making them more spatiallyaware. With the advent of Big Data, these conventional storage and spatial representation structures are becoming increasingly outdated, and required a new organization of spatial data. Approaches based on distributed storage and data lakes have been proposed, to integrate the complexity of spatial data, with operational and analytical systems which unfortunately quickly showed their limits. Recently the concept of lakehouse was introduced in order to integrate, among other things, the notion of reliability and ACID properties to the volume of data to be managed. This new data architecture is a combination of governed and reliable Data Warehouses and flexible, scalable and cost-effective Data Lakes.In this paper, we present how traditional approaches of spatial data management in the context of spatial big data have quickly shown their limits. We present a literature overview of these approaches, and how they led to the Data LakeHouse. We detail how the Lakehouse paradigm can be used and extended for managing spatial big data, by giving the different components and best practices for building a spatial data LakeHouse architecture optimized for the storage and computing over spatial big data.(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页码:70 / 79
页数:10
相关论文
共 45 条
  • [1] Aji A, 2015, Arxiv, DOI arXiv:1509.00910
  • [2] Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
    Aji, Ablimit
    Wang, Fusheng
    Vo, Hoang
    Lee, Rubao
    Liu, Qiaoling
    Zhang, Xiaodong
    Saltz, Joel
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1009 - 1020
  • [3] Alrehamy H, 2015, PROCEEDINGS 2015 IEEE FIFTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING BDCLOUD 2015, P160, DOI 10.1109/BDCloud.2015.62
  • [4] [Anonymous], About Us
  • [5] Apache Hudi, US
  • [6] Apache Iceberg, US
  • [7] Apache parquet, US
  • [8] Apache Sedona, US
  • [9] Armbrust M., 2021, Proceedings of CIDR
  • [10] Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores
    Armbrust, Michael
    Das, Tathagata
    Sun, Liwen
    Yavuz, Burak
    Zhu, Shixiong
    Murthy, Mukul
    Torres, Joseph
    van Hovell, Herman
    Ionescu, Adrian
    Luszczak, Alicja
    Switakowski, Michal
    Szafranski, Michal
    Li, Xiao
    Ueshin, Takuya
    Mokhtar, Mostafa
    Boncz, Peter
    Ghodsi, Ali
    Paranjpye, Sameer
    Senster, Pieter
    Xin, Reynold
    Zaharia, Matei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3411 - 3424