Metadata Management on Data Processing in Data Lakes

被引:8
作者
Megdiche, Imen [1 ]
Ravat, Franck [1 ]
Zhao, Yan [1 ,2 ]
机构
[1] Inst Rech Informat Toulouse, IRIT CNRS, UMR 5505, Toulouse, France
[2] Ctr Hosp Univ CHU Toulouse, Toulouse, France
来源
SOFSEM 2021: THEORY AND PRACTICE OF COMPUTER SCIENCE | 2021年 / 12607卷
关键词
Data lake; Data processing; Metadata management;
D O I
10.1007/978-3-030-67731-2_40
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data Lake (DL) is known as a Big Data analysis solution. A data lake stores not only data but also the processes that were carried out on these data. It is commonly agreed that data preparation/transformation takes most of the data analyst's time. To improve the efficiency of data processing in a DL, we propose a framework which includes a metadata model and algebraic transformation operations. The metadata model ensures the findability, accessibility, interoperability and reusability of data processes as well as data lineage of processes. Moreover, each process is described through a set of coarse-grained data transforming operations which can be applied to different types of datasets. We illustrate and validate our proposal with a real medical use case implementation.
引用
收藏
页码:553 / 562
页数:10
相关论文
共 14 条
[1]  
Alserafi A, 2016, INT CONF DAT MIN WOR, P178, DOI [10.1109/ICDMW.2016.0033, 10.1109/ICDMW.2016.87]
[2]  
[Anonymous], 2002, INT WORKSHOP DATA WA
[3]  
Diamantini C., 2018, ATTI VENTISEIESIMO C
[4]   Goods: Organizing Google's Datasets [J].
Halevy, Alon ;
Korn, Flip ;
Noy, Natalya F. ;
Olston, Christopher ;
Polyzotis, Neoklis ;
Roy, Sudip ;
Whang, Steven Euijong .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :795-806
[5]  
Hidalgo M., 2009, P ECMLPKDD 2009 WORK, P64
[6]   Foofah: Transforming Data By Example [J].
Jin, Zhongjun ;
Anderson, Michael R. ;
Cafarella, Michael ;
Jagadish, H., V .
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, :683-698
[7]  
Poole J, 2000, ECOOP 2000 WORKSHOP
[8]  
Quix C., 2016, COMPLEX SYST INF MOD, P67, DOI [DOI 10.7250/csimq.2016-9.04, 10.7250/csimq.2016-9.04, DOI 10.7250/CSIMQ.2016-9.04]
[9]   Metadata Management for Data Lakes [J].
Ravat, Franck ;
Zhao, Yan .
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 :37-44
[10]  
Simitsis A, 2009, LECT NOTES COMPUT SC, V5895, P199, DOI 10.1007/978-3-642-10424-4_15