Investigating reproducibility and tracking provenance - A genomic workflow case study

被引:39
|
作者
Kanwal, Sehrish [1 ]
Khan, Farah Zaib [1 ]
Lonie, Andrew [2 ]
Sinnott, Richard O. [1 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] Univ Melbourne, Melbourne Bioinformat, Melbourne, Vic 3010, Australia
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Reproducibility; Provenance; Workflow; Galaxy; Cpipe; Common Workflow Language (CWL); ENHANCING REPRODUCIBILITY; MOLECULAR-BIOLOGY; FRAMEWORK; WEB; TOOLKIT; SYSTEM;
D O I
10.1186/s12859-017-1747-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility is one of the core principles for any scientific workflow and remains a challenge, which is not fully addressed. This is due to incomplete understanding of reproducibility requirements and assumptions of workflow definition approaches. Provenance information should be tracked and used to capture all these requirements supporting reusability of existing workflows. Results: We have implemented a complex but widely deployed bioinformatics workflow using three representative approaches to workflow definition and execution. Through implementation, we identified assumptions implicit in these approaches that ultimately produce insufficient documentation of workflow requirements resulting in failed execution of the workflow. This study proposes a set of recommendations that aims to mitigate these assumptions and guides the scientific community to accomplish reproducible science, hence addressing reproducibility crisis. Conclusions: Reproducing, adapting or even repeating a bioinformatics workflow in any environment requires substantial technical knowledge of the workflow execution environment, resolving analysis assumptions and rigorous compliance with reproducibility requirements. Towards these goals, we propose conclusive recommendations that along with an explicit declaration of workflow specification would result in enhanced reproducibility of computational genomic analyses.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Investigating reproducibility and tracking provenance – A genomic workflow case study
    Sehrish Kanwal
    Farah Zaib Khan
    Andrew Lonie
    Richard O. Sinnott
    BMC Bioinformatics, 18
  • [2] Provenance and data differencing for workflow reproducibility analysis
    Missier, Paolo
    Woodman, Simon
    Hiden, Hugo
    Watson, Paul
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (04): : 995 - 1015
  • [3] Findable and reusable workflow data products: A genomic workflow case study
    Gaignard, Alban
    Skaf-Molli, Hala
    Belhajjame, Khalid
    SEMANTIC WEB, 2020, 11 (05) : 751 - 763
  • [4] Re-provisioning of Cloud-Based Execution Infrastructure Using the Cloud-Aware Provenance to Facilitate Scientific Workflow Execution Reproducibility
    Hasham, Khawar
    Munir, Kamran
    McClatchey, Richard
    Shamdasani, Jetendr
    CLOUD COMPUTING AND SERVICES SCIENCE, CLOSER 2015, 2016, 581 : 74 - 94
  • [5] A Study of Genomic Data Provenance in NoSQL Document-Oriented Database Systems
    Guimaraes, Valeria
    Hondo, Fernanda
    Almeida, Rodrigo
    Vera, Harley
    Holanda, Maristela
    Araujo, Aleteia
    Walter, Maria Emilia
    Lifschitz, Sergio
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1525 - 1531
  • [6] Experimenting with reproducibility: a case study of robustness in bioinformatics
    Kim, Yang-Min
    Poline, Jean-Baptiste
    Dumas, Guillaume
    GIGASCIENCE, 2018, 7 (07):
  • [7] Workflow and CIMOSA - background and case study
    Dickerhof, M
    Didic, MM
    Mampel, U
    COMPUTERS IN INDUSTRY, 1999, 40 (2-3) : 197 - 205
  • [8] Application of provenance in social computing: A case study
    Riveni, Mirela
    Tien-Dung Nguyen
    Aktas, Mehmet S.
    Dustdar, Schahram
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (03):
  • [9] Reproducibility of Indian DH Projects: A case study
    Jyothi Justin
    Nirmala Menon
    International Journal of Digital Humanities, 2023, 5 (2-3) : 333 - 351
  • [10] A primary Raman microscopic study of the turquoise and its role in provenance-tracking
    She Ling-zhu
    Qin Ying
    Feng Min
    Mao Zhen-wei
    Xu Cun-yi
    Huang Feng-chun
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2008, 28 (09) : 2107 - 2110