A collaborative semantic-based provenance management platform for reproducibility

被引:0
作者
Samuel S. [1 ,2 ]
König-Ries B. [1 ,2 ]
机构
[1] Michael Stifel Center Jena, Jena
[2] Heinz Nixdorf Chair for Distributed Information Systems, Friedrich-Schiller Universität Jena, Thuringia, Jena
关键词
Jupyter notebooks; Ontology; Provenance; Reproducibility; Research data management platform; Scientific experiments; Semantic web; Visualization;
D O I
10.7717/PEERJ-CS.921
中图分类号
学科分类号
摘要
Scientific data management plays a key role in the reproducibility of scientific results. To reproduce results, not only the results but also the data and steps of scientific experiments must be made findable, accessible, interoperable, and reusable. Tracking, managing, describing, and visualizing provenance helps in the understandability, reproducibility, and reuse of experiments for the scientific community. Current systems lack a link between the data, steps, and results from the computational and non-computational processes of an experiment. Such a link, however, is vital for the reproducibility of results. We present a novel solution for the end-to-end provenance management of scientific experiments. We provide a framework, CAESAR (CollAborative Environment for Scientific Analysis with Reproducibility), which allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational data and steps in an interoperable way. CAESAR integrates the REPRODUCE-ME provenance model, extended from existing semantic web standards, to represent the whole picture of an experiment describing the path it took from its design to its result. ProvBook, an extension for Jupyter Notebooks, is developed and integrated into CAESAR to support computational reproducibility. We have applied and evaluated our contributions to a set of scientific experiments in microscopy research projects. © Copyright 2022 Samuel and König-Ries
引用
收藏
相关论文
共 64 条
  • [1] Allan C, Burel J-M, Moore J, Blackburn C, Linkert M, Loynton S, MacDonald D, Moore WJ, Neves C, Patterson A, Porter M, Tarkowska A, Loranger B, Avondo J, Lagerstedt I, Lianas L, Leo S, Hands K, Hay RT, Patwardhan A, Best C, Kleywegt GJ, Zanetti G, Swedlow JR., OMERO: flexible, model-driven data management for experimental biology, Nature Methods, 9, 3, pp. 245-253, (2012)
  • [2] Altintas I, Berkley C, Jaeger E, Jones MB, Ludascher B, Mock S., Kepler: an extensible system for design and execution of scientific workflows, Proceedings ofthe 16th international conference on scientific and statistical database management (SSDBM 2004), pp. 423-424, (2004)
  • [3] Amstutz P, Crusoe MR, Tijanic N, Chapman B, Chilton J, Heuer M, Kartashov A, Leehr D, Menager H, Nedeljkovich M, Scales M, Soiland-Reyes S, Stojanovic L., Common workflow language, (2016)
  • [4] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G., Gene ontology: tool for the unification ofbiology, Nature genetics, 25, 1, (2000)
  • [5] Baker M., 1,500 scientists lift the lid on reproducibility, Nature News, 533, 7604, (2016)
  • [6] Belhajjame K, Zhao J, Garijo D, Gamble M, Hettne K, Palma R, Mina E, Corcho O, Gmez-Prez JM, Bechhofer S, Klyne G, Goble C., Using a suite of ontologies for preserving workflow-centric research objects, Web Semantics: Science, Services and Agents on the World Wide Web, 32, pp. 16-42, (2015)
  • [7] BEXIS2 UserDevConf workshop on fostering reproducible science, (2017)
  • [8] Brank J, Grobelnik M, Mladenic D., A survey of ontology evaluation techniques, Proceedings ofthe conference on data miningand data warehouses (SiKDD 2005), pp. 166-170, (2005)
  • [9] Bruggemann S, Bereta K, Xiao G, Koubarakis M., Ontology-based data access for maritime security, The Semantic Web. Latest advances and new domains, pp. 741-757, (2016)
  • [10] Calvanese D, Cogrel B, Komla-Ebri S, Kontchakov R, Lanti D, Rezk M, Rodriguez-Muro M, Xiao G., Ontop: answering SPARQL queries over relational databases, Semantic Web, 8, 3, (2017)