Ingestion of a Data Lake into a NoSQL Data Warehouse: The Case of Relational Databases

被引:2
作者
Abdelhedi, Fatma [1 ]
Jemmali, Rym [1 ,2 ]
Zurfluh, Gilles [2 ]
机构
[1] Trimane, CBI2, Paris, France
[2] Toulouse Univ, IRIT CNRS UMR 5505, Toulouse, France
来源
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KMIS), VOL 3 | 2021年
关键词
Data Lake; Data Warehouse; NoSQL; Big Data; Relational Database; MDA; QVT;
D O I
10.5220/0010690600003064
中图分类号
F [经济];
学科分类号
02 ;
摘要
The exponential growth of collected data, following the digital transformation of companies, has led to the evolution of databases towards Big Data. Our work is part of this context and concerns more particularly the mechanisms allowing to extract datasets from a Data Lake and to store them in a unique Data Warehouse. This one will allow to realize, in a second time, decisional analyses facilitated by the functionalities offered by the NoSQL systems (richness of the data structures, query language, access performances). This article proposes an extraction mechanism applied only to relational databases of the Data Lake. This mechanism relies on an automatic approach based on the Model Driven Architecture (MDA) which provides a set of schema transformation rules, formalized with the Query/View/Transform (QVT) language. From the physical schemas describing relational databases, we propose transformation rules that allow to generate a physical model of a Data Warehouse stored on a document-oriented NoSQL system (OrientDB). This paper presents the successive steps of the transformation process from the meta-modeling of the datasets to the application of the rules and algorithms. We provide an experimentation using a case study related to the health care field.
引用
收藏
页码:64 / 72
页数:9
相关论文
共 15 条
[1]   ESTOCADA: Towards Scalable Polystore Systems [J].
Alotaibi, R. ;
Cautis, B. ;
Deutsch, A. ;
Latrache, M. ;
Manolescu, I ;
Yang, Y. .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12) :2949-2952
[2]  
Bruel J., 2019, COMPARING CLASSIFYIN
[3]  
Candel C. J. F., 2021, UNIFIED METAMODEL NO
[4]   Comparison of Relational Database with Document-Oriented Database (MongoDB) for Big Data Applications [J].
Chickerur, Satyadhyan ;
Goudar, Anoop ;
Kinnerkar, Ankita .
2015 8TH INTERNATIONAL CONFERENCE ON ADVANCED SOFTWARE ENGINEERING & ITS APPLICATIONS (ASEA), 2015, :41-47
[5]   A New Metadata Model to Uniformly Handle Heterogeneous Data Lake Sources [J].
Diamantini, Claudia ;
Lo Giudice, Paolo ;
Musarella, Lorenzo ;
Potena, Domenico ;
Storti, Emanuele ;
Ursino, Domenico .
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2018, 2018, 909 :165-177
[6]   The BigDAWG Polystore System [J].
Duggan, Jennie ;
Elmore, Aaron J. ;
Stonebraker, Michael ;
Balazinska, Magda ;
Howe, Bill ;
Kepner, Jeremy ;
Madden, Sam ;
Maier, David ;
Mattson, Tim ;
Zdonik, Stan .
SIGMOD RECORD, 2015, 44 (02) :11-16
[7]   Benchmarking Big Data OLAP NoSQL Databases [J].
El Malki, Mohammed ;
Kopliku, Arlind ;
Sabir, Essaid ;
Teste, Olivier .
UBIQUITOUS NETWORKING, UNET 2018, 2018, 11277 :82-94
[8]  
Erraissi A., 2020, MANAGING BIG DATA US, P1235
[9]  
Hanine M., 2015, DATA MIGRATION METHO, V9, P6
[10]   Data lake: a new ideology in big data era [J].
Khine, Pwint Phyu ;
Wang, Zhao Shun .
4TH ANNUAL INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATION AND SENSOR NETWORK (WCSN 2017), 2018, 17