Data Provenance in Biomedical Research: Scoping Review

被引:8
作者
Johns, Marco [1 ]
Meurers, Thierry [1 ]
Wirth, Felix N. [1 ]
Haber, Anna C. [1 ]
Mueller, Armin [1 ]
Halilovic, Mehmed [1 ]
Balzer, Felix [2 ]
Prasser, Fabian [1 ]
机构
[1] Charite Univ Med Berlin, Berlin Inst Hlth, Med Informat Grp, Charite Pl 1, D-10117 Berlin, Germany
[2] Charite Univ Med Berlin, Inst Med Informat, Berlin, Germany
基金
英国科研创新办公室;
关键词
data provenance; biomedical research; scoping review; systematization; comparison; VISUALIZATION; REPRODUCIBILITY; SCIENCE; MODEL;
D O I
10.2196/42289
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Data provenance refers to the origin, processing, and movement of data. Reliable and precise knowledge about data provenance has great potential to improve reproducibility as well as quality in biomedical research and, therefore, to foster good scientific practice. However, despite the increasing interest on data provenance technologies in the literature and their implementation in other disciplines, these technologies have not yet been widely adopted in biomedical research. Objective: The aim of this scoping review was to provide a structured overview of the body of knowledge on provenance methods in biomedical research by systematizing articles covering data provenance technologies developed for or used in this application area; describing and comparing the functionalities as well as the design of the provenance technologies used; and identifying gaps in the literature, which could provide opportunities for future research on technologies that could receive more widespread adoption. Methods: Following a methodological framework for scoping studies and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines, articles were identified by searching the PubMed, IEEE Xplore, and Web of Science databases and subsequently screened for eligibility. We included original articles covering software-based provenance management for scientific research published between 2010 and 2021. A set of data items was defined along the following five axes: publication metadata, application scope, provenance aspects covered, data representation, and functionalities. The data items were extracted from the articles, stored in a charting spreadsheet, and summarized in tables and figures. Results: We identified 44 original articles published between 2010 and 2021. We found that the solutions described were heterogeneous along all axes. We also identified relationships among motivations for the use of provenance information, feature sets (capture, storage, retrieval, visualization, and analysis), and implementation details such as the data models and technologies used. The important gap that we identified is that only a few publications address the analysis of provenance data or use established provenance standards, such as PROV. Conclusions: The heterogeneity of provenance methods, models, and implementations found in the literature points to the lack of a unified understanding of provenance concepts for biomedical data. Providing a common framework, a biomedical reference, and benchmarking data sets could foster the development of more comprehensive provenance solutions.
引用
收藏
页数:17
相关论文
共 76 条
  • [71] MeDShare: Trust-Less Medical Data Sharing Among Cloud Service Providers via Blockchain
    Xia, Qi
    Sifah, Emmanuel Boateng
    Asamoah, Kwame Omono
    Gao, Jianbin
    Du, Xiaojiang
    Guizani, Mohsen
    [J]. IEEE ACCESS, 2017, 5 : 14757 - 14767
  • [72] Xu Shen, 2018, AMIA Jt Summits Transl Sci Proc, V2017, P263
  • [73] Blockchain for healthcare data management: opportunities, challenges, and future recommendations
    Yaqoob, Ibrar
    Salah, Khaled
    Jayaraman, Raja
    Al-Hammadi, Yousof
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (14) : 11475 - 11490
  • [74] Modeling Evidence-Based Medicine Applications with Provenance Data in Pathways
    Yildiz, Ustun
    Belhajjame, Khalid
    Grigori, Daniela
    [J]. PROCEEDINGS OF THE 2015 9TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING TECHNOLOGIES FOR HEALTHCARE (PERVASIVEHEALTH), 2015, : 337 - 338
  • [75] BEERE: a web server for biomedical entity expansion, ranking and explorations
    Yue, Zongliang
    Willey, Christopher D.
    Hjelmeland, Anita B.
    Chen, Jake Y.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (W1) : W578 - W586
  • [76] Zhuang Yu, 2018, AMIA Annu Symp Proc, V2018, P1167