Container-based bioinformatics with Pachyderm

被引:27
|
作者
Novella, Jon Ander [1 ,2 ]
Emami Khoonsari, Payam [3 ]
Herman, Stephanie [1 ,2 ,3 ]
Whitenack, Daniel [4 ]
Capuccini, Marco [1 ,2 ,5 ]
Burman, Joachim [6 ]
Kultima, Kim [3 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, S-75214 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, S-75214 Uppsala, Sweden
[3] Uppsala Univ, Dept Med Sci, Clin Chem, S-75185 Uppsala, Sweden
[4] Pachyderm Inc, San Francisco, CA 94107 USA
[5] Uppsala Univ, Dept Informat Technol, S-75105 Uppsala, Sweden
[6] Uppsala Univ, Dept Neurosci, S-75185 Uppsala, Sweden
基金
瑞典研究理事会; 欧盟地平线“2020”;
关键词
MASS-SPECTROMETRY;
D O I
10.1093/bioinformatics/bty699
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages and (iii) a data management layer that tracks data as it moves through the processing pipeline. Results Pachyderm is an open-source workflow system and data management framework that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [41] Container-based Real-time Video Transcoding
    Sameti, Sajad
    Wang, Mea
    Krishnamurthy, Diwakar
    PROCEEDINGS OF THE IEEE LCN: 2019 44TH ANNUAL IEEE CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2019), 2019, : 157 - 160
  • [42] Container-Based Cloud Platform for Mobile Computation Offloading
    Wu, Song
    Niu, Chao
    Rao, Jia
    Jin, Hai
    Dai, Xiaohai
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 123 - 132
  • [43] A Survey on Observability of Distributed Edge & Container-Based Microservices
    Usman, Muhammad
    Ferlin, Simone
    Brunstrom, Anna
    Taheri, Javid
    IEEE ACCESS, 2022, 10 : 86904 - 86919
  • [44] Container-Based Honeypot Deployment for the Analysis of Malicious Activity
    Kyriakou, Andronikos
    Sklavos, Nicolas
    2018 GLOBAL INFORMATION INFRASTRUCTURE AND NETWORKING SYMPOSIUM (GIIS), 2018,
  • [45] Container-based Service State Management in Cloud Computing
    Nath, Shubha Brata
    Addya, Sourav Kanti
    Chakraborty, Sandip
    Ghosh, Soumya K.
    2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 487 - 493
  • [46] An Approach for Reconstructing Applications to Develop Container-Based Microservices
    Park, Joonseok
    Kim, Daeho
    Yeom, Keunhyuk
    MOBILE INFORMATION SYSTEMS, 2020, 2020
  • [47] Proposal of Container-Based HPC Structures and Performance Analysis
    Yong, Chanho
    Lee, Go-Won
    Huh, Eui-Nam
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (06): : 1398 - 1404
  • [48] Lightweight Container-Based OpenEPC Deployment and its Evaluation
    Fontenla-Gonzalez, Jorge
    Perez-Garrido, Carlos
    Gil-Castineira, Felipe
    Gonzalez-Castano, Francisco J.
    Giraldo-Rodriguez, Carlos
    2016 IEEE NETSOFT CONFERENCE AND WORKSHOPS (NETSOFT), 2016, : 435 - 440
  • [49] Container-Based Complex Programming Skills Training Platform
    Wang, Wei
    Wang, Tao
    Yin, Gang
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 34 - 38
  • [50] Managing schema evolution in a container-based persistent system
    Perez-Schofield, JBG
    Roselló, EG
    Cooper, TB
    Cota, MP
    SOFTWARE-PRACTICE & EXPERIENCE, 2002, 32 (14): : 1395 - 1410