Container-based bioinformatics with Pachyderm

被引:27
|
作者
Novella, Jon Ander [1 ,2 ]
Emami Khoonsari, Payam [3 ]
Herman, Stephanie [1 ,2 ,3 ]
Whitenack, Daniel [4 ]
Capuccini, Marco [1 ,2 ,5 ]
Burman, Joachim [6 ]
Kultima, Kim [3 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, S-75214 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, S-75214 Uppsala, Sweden
[3] Uppsala Univ, Dept Med Sci, Clin Chem, S-75185 Uppsala, Sweden
[4] Pachyderm Inc, San Francisco, CA 94107 USA
[5] Uppsala Univ, Dept Informat Technol, S-75105 Uppsala, Sweden
[6] Uppsala Univ, Dept Neurosci, S-75185 Uppsala, Sweden
基金
瑞典研究理事会; 欧盟地平线“2020”;
关键词
MASS-SPECTROMETRY;
D O I
10.1093/bioinformatics/bty699
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages and (iii) a data management layer that tracks data as it moves through the processing pipeline. Results Pachyderm is an open-source workflow system and data management framework that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [1] Serverless computing for container-based architectures
    Perez, Alfonso
    Molto, German
    Caballer, Miguel
    Calatrava, Amanda
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 83 : 50 - 59
  • [2] An Agile Container-based Approach to TaaS
    Verdugo, Pedro
    Salvachua, Joaquin
    Huecas, Gabriel
    2017 56TH FITCE CONGRESS, 2017, : 10 - 15
  • [3] Container-based virtual elastic clusters
    de Alfonso, Carlos
    Calatrava, Amanda
    Molto, German
    JOURNAL OF SYSTEMS AND SOFTWARE, 2017, 127 : 1 - 11
  • [4] Container-Based Platform for Computational Medicine
    Pezzullo, Gennaro, Jr.
    Di Martino, Beniamino
    Bubak, Marian
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 3, 2022, 451 : 131 - 140
  • [5] Container-based Video Streaming Service
    Vidiecan, Matus
    Bobak, Martin
    2022 IEEE 22ND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS AND 8TH IEEE INTERNATIONAL CONFERENCE ON RECENT ACHIEVEMENTS IN MECHATRONICS, AUTOMATION, COMPUTER SCIENCE AND ROBOTICS (CINTI-MACRO), 2022, : 191 - 196
  • [6] Teaching Container-Based DevOps Practices
    Kousa, Jami
    Ihantola, Petri
    Hellas, Arto
    Luukkainen, Matti
    WEB ENGINEERING, ICWE 2020, 2020, 12128 : 494 - 502
  • [7] Enabling Container-based Fog computing with OpenStack
    Benomar, Zakaria
    Longo, Francesco
    Merlino, Giovanni
    Puliafito, Antonio
    2019 INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2019, : 1049 - 1056
  • [8] Container-based Emulation of Network Control Plane
    Kang, Hui
    Tao, Shu
    PROCEEDINGS OF THE 2017 WORKSHOP ON HOT TOPICS IN CONTAINER NETWORKING AND NETWORKED SYSTEMS (HOTCONNET 17), 2017, : 24 - 29
  • [9] A Container-Based Framework for Developing ROS Applications
    Melo, Pedro
    Arrais, Rafael
    Teixeira, Sergio
    Veiga, Germano
    2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 280 - 285
  • [10] Quantifying Cloud Elasticity with Container-based Autoscaling
    Tang, Xuxin
    Zhang, Fan
    Li, Xiu
    Khan, Samee U.
    Li, Zhijiang
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 853 - 860