Reproducibility of Computational Experiments on Kubernetes-Managed Container Clouds with HyperFlow

被引:4
作者
Orzechowski, Michal [1 ]
Balis, Bartosz [1 ]
Slota, Renata G. [1 ]
Kitowski, Jacek [1 ,2 ]
机构
[1] AGH Univ Sci & Technol, Dept Comp Sci, Krakow, Poland
[2] AGH Univ Sci & Technol, ACK Cyfronet AGH, Krakow, Poland
来源
COMPUTATIONAL SCIENCE - ICCS 2020, PT I | 2020年 / 12137卷
关键词
Scientific workflows; Reproducibility; Cloud computing; Application containers; Container clouds; Kubernetes; SCIENCE; SYSTEM;
D O I
10.1007/978-3-030-50371-0_16
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We propose a comprehensive solution for reproducibility of scientific workflows. We focus particularly on Kubernetes-managed container clouds, increasingly important in scientific computing. Our solution addresses conservation of the scientific procedure, scientific data, execution environment and experiment deployment, while using standard tools in order to avoid maintainability issues that can obstruct reproducibility. We introduce an Experiment Digital Object (EDO), a record published in an open science repository that contains artifacts required to reproduce an experiment. We demonstrate a variety of reproducibility scenarios including experiment repetition (same experiment and conditions), replication (same experiment, different conditions), and propose a smart reuse scenario in which a previous experiment is partially replayed and partially re-executed. The approach is implemented in the HyperFlow workflow management system and experimentally evaluated using a genomic scientific workflow. The experiment is published as an EDO record on the Zenodo platform.
引用
收藏
页码:220 / 233
页数:14
相关论文
共 18 条
[1]   Introducing PRECIP: An API for Managing Repeatable Experiments in the Cloud [J].
Azarnoosh, Sepideh ;
Rynge, Mats ;
Juve, Gideon ;
Deelman, Ewa ;
Niec, Michal ;
Malawski, Maciej ;
da Silva, Rafael Ferreira .
2013 IEEE FIFTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), VOL 2, 2013, :19-26
[2]   HyperFlow: A model of computation, programming approach and enactment engine for complex distributed workflows [J].
Balis, Bartosz .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :147-162
[3]   Reproducible Scientific Workflows for High Performance and Cloud Computing [J].
Bartusch, Felix ;
Hanussek, Maximilian ;
Krueger, Jens ;
Kohlbacher, Oliver .
2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, :161-164
[4]   Enabling HPC workloads on Cloud Infrastructure using Kubernetes Container Orchestration Mechanisms [J].
Beltre, Angel ;
Saha, Pankaj ;
Govindaraju, Madhusudhan ;
Younge, Andrew J. ;
Grant, Ryan Eric .
PROCEEDINGS OF CANOPIE-HPC 2019:2019 IEEE/ACM 1ST INTERNATIONAL WORKSHOP ON CONTAINERS AND NEW ORCHESTRATION PARADIGMS FOR ISOLATED ENVIRONMENTS IN HPC (CANOPIE-HPC), 2019, :11-20
[5]   ABSTRACTION LAYER FOR DEVELOPMENT AND DEPLOYMENT OF CLOUD SERVICES [J].
Binh Minh Nguyen ;
Viet Tran ;
Hluchy, Ladislav .
COMPUTER SCIENCE-AGH, 2012, 13 (03) :79-88
[6]   Scientific workflows for computational reproducibility in the life sciences: Status, challenges and opportunities [J].
Cohen-Boulakia, Sarah ;
Belhajjame, Khalid ;
Collin, Olivier ;
Chopard, Jerome ;
Froidevaux, Christine ;
Gaignard, Alban ;
Hinsen, Konrad ;
Larmande, Pierre ;
Le Brass, Yvan ;
Lemoine, Frederic ;
Mareuil, Fabien ;
Menager, Herve ;
Pradal, Christophe ;
Blanchet, Christophe .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 :284-298
[7]   Pegasus, a workflow management system for science automation [J].
Deelman, Ewa ;
Vahi, Karan ;
Juve, Gideon ;
Rynge, Mats ;
Callaghan, Scott ;
Maechling, Philip J. ;
Mayani, Rajiv ;
Chen, Weiwei ;
da Silva, Rafael Ferreira ;
Livny, Miron ;
Wenger, Kent .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2015, 46 :17-35
[8]   Workflows and e-Science: An overview of workflow system features and capabilities [J].
Deelman, Ewa ;
Gannon, Dennis ;
Shields, Matthew ;
Taylor, Ian .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2009, 25 (05) :528-540
[9]   Onedata - a Step Forward towards Globalization of Data Access for Computing Infrastructures [J].
Dutka, Lukasz ;
Wrzeszcz, Michal ;
Lichon, Tomasz ;
Slota, Rafal ;
Zemek, Konrad ;
Trzepla, Krzysztof ;
Opiola, Lukasz ;
Slota, Renata ;
Kitowski, Jacek .
INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 :2843-2847
[10]   Resource Management for Running HPC Applications in Container Clouds [J].
Herbein, Stephen ;
Dusia, Ayush ;
Landwehr, Aaron ;
McDaniel, Sean ;
Monsalve, Jose ;
Yang, Yang ;
Seelam, Seetharami R. ;
Taufer, Michela .
HIGH PERFORMANCE COMPUTING, 2016, 9697 :261-278