Towards a More Generic and Elastic Metadata Management Model in a Data Lake Environment

被引:0
作者
Sore, Safiatou [1 ]
Ouedraogo, Frederic T. [1 ]
Bikienga, Moustapha [1 ]
Traore, Yaya [2 ]
机构
[1] Univ Norbert ZONGO Koudougou, Koudougou, Burkina Faso
[2] Univ Joseph Ki Zerbo Ouagadougou, Ouagadougou, Burkina Faso
来源
2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024 | 2024年
关键词
data lake; metadata; scalability; elasticity; BIG DATA;
D O I
10.1145/3651671.3651773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evolution of the vast amount of heterogeneous data sources is leading to the emergence of several new concepts. One of the best-known concepts that is emerging as a new and trending topic in the big data space is the data lake. This is a central repository that stores heterogeneous data sources in their native format, without any predefined schema. In the absence of an enforced schema, effective metadata management based on metadata models remains an active research topic to address the problems associated with the data lake: the "data swamp". The analysis of existing metadata models shows that there is no comprehensive model among them. In this paper, we present a generic and scalable metadata model, which refers to the ability to dynamically provision computing resources based on demand and to resize resources as needed during metadata integration. Our approach will be based on a functional architecture of the data lake, along with a set of features that promote the generality of the metadata model.
引用
收藏
页码:44 / 51
页数:8
相关论文
共 39 条
[1]  
Akhter Adnan, 2018, Knowledge Engineering and Knowledge Management. 21st International Conference, EKAW 2018. Proceedings: Lecture Notes in Artificial Intelligence (LNAI 11313), P3, DOI 10.1007/978-3-030-03667-6_1
[2]  
Alserafi A, 2016, INT CONF DAT MIN WOR, P178, DOI [10.1109/ICDMW.2016.87, 10.1109/ICDMW.2016.0033]
[3]   A Three-Layered Approach for Designing Smart Contracts in Collaborative Processes [J].
Bagozi, Ada ;
Bianchini, Devis ;
De Antonellis, Valeria ;
Garda, Massimiliano ;
Melchiori, Michele .
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2019 CONFERENCES, 2019, 11877 :440-457
[4]   CoreKG: a Knowledge Lake Service [J].
Beheshti, Amin ;
Benatallah, Boualem ;
Nouri, Reza ;
Tabebordbar, Alireza .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12) :1942-1945
[5]  
Cherradi M., EMEMODL EXTENSIBLE M
[6]   A New Metadata Model to Uniformly Handle Heterogeneous Data Lake Sources [J].
Diamantini, Claudia ;
Lo Giudice, Paolo ;
Musarella, Lorenzo ;
Potena, Domenico ;
Storti, Emanuele ;
Ursino, Domenico .
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2018, 2018, 909 :165-177
[7]   HANDLE - A Generic Metadata Model for Data Lakes [J].
Eichler, Rebecca ;
Giebler, Corinna ;
Groeger, Christoph ;
Schwarz, Holger ;
Mitschang, Bernhard .
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2020), 2020, 12393 :73-88
[8]  
Fang H, 2015, IEEE ANN INT CONF CY, P820, DOI 10.1109/CYBER.2015.7288049
[9]  
Halevy Alon., 2016, Data Engineering, P5
[10]  
Hauch R., 2005, Proceedings of the 2005 ACM SIGMOD international conference on management of data, P793