Data variety, come as you are in multi-model data warehouses

被引:7
作者
Bimonte, Sandro [1 ]
Gallinucci, Enrico [2 ]
Marcel, Patrick [3 ]
Rizzi, Stefano [2 ]
机构
[1] Univ Clermont Auvergne, INRAE TSCF, Aubiere, France
[2] Univ Bologna, DISI, Bologna, Italy
[3] Univ Tours, LIFAT Lab, Tours, France
关键词
OLAP; Multi-model databases; Data variety; Data warehouse; OLAP; METRICS;
D O I
10.1016/j.is.2021.101734
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured, fixed schema. In this paper, we investigate the performances of an MMDBMS when used to store multidimensional data for OLAP analyses. A multi-model DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL, and that of ensuring better flexibility, extensibility, and evolvability thanks to the combined use of structured and schemaless data. To support our investigation we define a multidimensional schema for the UniBench benchmark dataset and an ad hoc OLAP workload for it. Then we propose and compare three logical solutions implemented on the PostgreSQL multi-model DBMS: one that extends a star schema with JSON, XML, graph-based, and key-value data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced). As expected, the full-relational implementation generally performs better than the multi-model one, but this is balanced by the benefits of MMDBMSs in dealing with variety. Finally, we give our perspective view of the research on this topic. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 40 条
  • [1] Fusion Cubes: Towards Self-Service Business Intelligence
    Abello, Alberto
    Darmont, Jerome
    Etcheverry, Lorena
    Golfarelli, Matteo
    Mazon, Jose-Norberto
    Naumann, Felix
    Pedersen, Torben Bach
    Rizzi, Stefano
    Trujillo, Juan
    Vassiliadis, Panos
    Vossen, Gottfried
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (02) : 66 - 88
  • [2] [Anonymous], 2009, Data Warehouse Design: Modern Principles and Methodologies
  • [3] Uniform access to NoSQL systems
    Atzeni, Paolo
    Bugiotti, Francesca
    Rossi, Luca
    [J]. INFORMATION SYSTEMS, 2014, 43 : 117 - 133
  • [4] B. G. Inc.', 2017, ARCH AG
  • [5] Answering GPSJ Queries in a Polystore: A Dataspace-Based Approach
    Ben Hamadou, Hamdi
    Gallinucci, Enrico
    Golfarelli, Matteo
    [J]. CONCEPTUAL MODELING, ER 2019, 2019, 11788 : 189 - 203
  • [6] Berkani N, 2019, P DOLAP EDBT ICDT LI, P1
  • [7] Bimonte S, 2020, PROC DOLAP EDBTICDT, ppp66
  • [8] Efficient Compression and Storage of XML OLAP Cubes
    Boukraa, Doulkifli
    Bouchoukh, Mohammed Amin
    Boussaid, Omar
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2015, 11 (03) : 1 - 25
  • [9] Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases
    Boussahoua, Mohamed
    Boussaid, Omar
    Bentayeb, Fadila
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT II, 2017, 10439 : 247 - 256
  • [10] Castelltort Arnaud, 2014, 6th International Conference on Knowledge Discovery and Information Retrieval (KDIR 2014). Proceedings, P217