Data provenance management and application in Clinical Data Spaces

被引:1
作者
Bao, Xiaoyuan [1 ]
Jiang, Jingsi [2 ]
Zhang, Kai [3 ]
机构
[1] Peking Univ, Hlth Sci Ctr, Med Informat Ctr, Beijing, Peoples R China
[2] Peking Univ, Hlth Sci Ctr, Basic Med Sci, Beijing, Peoples R China
[3] Peking Univ, Hlth Sci Ctr, Peoples Hosp, Beijing, Peoples R China
来源
2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021) | 2021年
关键词
Data Provenance; Data Management; Clinical Data; DATABASES;
D O I
10.1109/CSCI54926.2021.00060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the era of big data, management of the increasing volume of clinical data has become a research hotspot. To manage the massive data in clinical data centers, and in order to clarify the data source, data iteration and data flow, we propose a data provenance management method based on a three-tier structure, which divides the management of data lineages into a data layer, a semantic layer, and a presentation layer. The method implemented in this paper collects all the commands executed by the operating system, the logs generated by each process, and the physical locations and relevant attributes of all data objects such as data tables, data files, dictionary files, web pages, etc. The semantic layer processes relevant information and generates a data lineage log based on the PROV standard proposed by W3C, and presented it at the presentation layer. Our method records the source, process, and storage location of each process, which enabled process replay. At the same time, our method semantically annotates each data object, which can better describe the source of data items, processing process, etc. Our method enables the replay of the analysis process and the retrospective analysis of the output process. And can be widely used in the management of massive clinical data.
引用
收藏
页码:1255 / 1258
页数:4
相关论文
共 15 条
[1]  
Bao Xiaoyuan, 2019, BIG DATA, P47
[2]   Estimating radionuclide transfer to wild species - data requirements and availability for terrestrial ecosystems [J].
Beresford, NA ;
Broadley, MR ;
Howard, BJ ;
Barnett, CL ;
White, PJ .
JOURNAL OF RADIOLOGICAL PROTECTION, 2004, 24 (4A) :A89-A103
[3]   Transparency, usability, and reproducibility: Guiding principles for improving comparative databases using primates as examples [J].
Borries, Carola ;
Sandel, Aaron A. ;
Koenig, Andreas ;
Fernandez-Duque, Eduardo ;
Kamilar, Jason M. ;
Amoroso, Caroline R. ;
Barton, Robert A. ;
Bray, Joel ;
Di Fiore, Anthony ;
Gilby, Ian C. ;
Gordon, Adam D. ;
Mundry, Roger ;
Port, Markus ;
Powell, Lauren E. ;
Pusey, Anne E. ;
Spriggs, Amanda ;
Nunn, Charles L. .
EVOLUTIONARY ANTHROPOLOGY, 2016, 25 (05) :232-238
[4]   Chemical Entity Semantic Specification: Knowledge representation for efficient semantic cheminformatics and facile data integration [J].
Chepelev, Leonid L. ;
Dumontier, Michel .
JOURNAL OF CHEMINFORMATICS, 2011, 3
[5]  
de Lusignan S, 2011, Yearb Med Inform, V6, P112
[6]   Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life [J].
Kamminga, Tjerko ;
Koehorst, Jasper J. ;
Vermeij, Paul ;
Slagman, Simen-Jan ;
dos Santos, Vitor A. P. Martins ;
Bijlsma, Jetta J. E. ;
Schaap, Peter J. .
FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2017, 7
[7]   Research on dataspace [J].
Li, Yu-Kun ;
Meng, Xiao-Feng ;
Zhang, Xiang-Yu .
Ruan Jian Xue Bao/Journal of Software, 2008, 19 (08) :2018-2031
[8]  
Missier P., 2013, 16 INT C EXT DAT TEC, P773, DOI DOI 10.1145/2452376.2452478
[9]  
[聂娟 Nie Juan], 2016, [农业机械学报, Transactions of the Chinese Society for Agricultural Machinery], V47, P245
[10]  
Pasquier T, 2017, SCI DATA, P4