Container-based bioinformatics with Pachyderm

被引:27
|
作者
Novella, Jon Ander [1 ,2 ]
Emami Khoonsari, Payam [3 ]
Herman, Stephanie [1 ,2 ,3 ]
Whitenack, Daniel [4 ]
Capuccini, Marco [1 ,2 ,5 ]
Burman, Joachim [6 ]
Kultima, Kim [3 ]
Spjuth, Ola [1 ,2 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, S-75214 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, S-75214 Uppsala, Sweden
[3] Uppsala Univ, Dept Med Sci, Clin Chem, S-75185 Uppsala, Sweden
[4] Pachyderm Inc, San Francisco, CA 94107 USA
[5] Uppsala Univ, Dept Informat Technol, S-75105 Uppsala, Sweden
[6] Uppsala Univ, Dept Neurosci, S-75185 Uppsala, Sweden
基金
瑞典研究理事会; 欧盟地平线“2020”;
关键词
MASS-SPECTROMETRY;
D O I
10.1093/bioinformatics/bty699
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Computational biologists face many challenges related to data size, and they need to manage complicated analyses often including multiple stages and multiple tools, all of which must be deployed to modern infrastructures. To address these challenges and maintain reproducibility of results, researchers need (i) a reliable way to run processing stages in any computational environment, (ii) a well-defined way to orchestrate those processing stages and (iii) a data management layer that tracks data as it moves through the processing pipeline. Results Pachyderm is an open-source workflow system and data management framework that fulfils these needs by creating a data pipelining and data versioning layer on top of projects from the container ecosystem, having Kubernetes as the backbone for container orchestration. We adapted Pachyderm and demonstrated its attractive properties in bioinformatics. A Helm Chart was created so that researchers can use Pachyderm in multiple scenarios. The Pachyderm File System was extended to support block storage. A wrapper for initiating Pachyderm on cloud-agnostic virtual infrastructures was created. The benefits of Pachyderm are illustrated via a large metabolomics workflow, demonstrating that Pachyderm enables efficient and sustainable data science workflows while maintaining reproducibility and scalability.
引用
收藏
页码:839 / 846
页数:8
相关论文
共 50 条
  • [31] Container-Based Privacy Preserving Scheme for Android Applications
    Cui, Haoliang
    Shao, Shuai
    Niu, Shaozhang
    Zhang, Wen
    Yuan, Yang
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (04) : 731 - 737
  • [32] CMonitor: A Monitoring and Alarming Platform for Container-Based Clouds
    Ji, Shujian
    Ye, Kejiang
    Xu, Cheng-Zhong
    CLOUD COMPUTING - CLOUD 2019, 2019, 11513 : 324 - 339
  • [33] Hybrid Autoscaling Strategy on Container-Based Cloud Platform
    Do, Truong-xuan
    Tan, Vu Khanh Ngo
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [34] Providing Security in Container-Based HPC Runtime Environments
    Gantikow, Holger
    Reich, Christoph
    Knahl, Martin
    Clarke, Nathan
    HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, 2016, 9945 : 685 - 695
  • [35] Towards Container-Based Resource Management for the Internet of Things
    Renner, Thomas
    Meldau, Marius
    Kliem, Andreas
    2016 INTERNATIONAL CONFERENCE ON SOFTWARE NETWORKING (ICSN), 2016, : 61 - 65
  • [36] Flexible Network Address Mapping for Container-based Clouds
    Kim, Kyung-Hwa
    Lee, Jae Woo
    Ben-Ami, Michael
    Nam, Hyunwoo
    Janak, Jan
    Schulzrinne, Henning
    2015 1st IEEE Conference on Network Softwarization (NetSoft), 2015,
  • [37] Container-based MQTT Broker Cluster for Edge Computing
    Thean, Zhong Ying
    Yap, Vooi Voon
    Teh, Peh Chiong
    2019 4TH INTERNATIONAL CONFERENCE AND WORKSHOPS ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE): THRIVING TECHNOLOGIES, 2019,
  • [38] Pannier: A Container-based Flash Cache for Compound Objects
    Li, Cheng
    Shilane, Philip
    Douglis, Fred
    Wallace, Grant
    Proceedings of the 16th Annual Middleware Conference, 2015, : 50 - 62
  • [39] Taking Container-Based Sanitation to Scale: Opportunities and Challenges
    Russel, Kory C.
    Hughes, Kelvin
    Roach, Mary
    Auerbach, David
    Foote, Andrew
    Kramer, Sasha
    Briceno, Raul
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2019, 7
  • [40] Container-Based Privacy Preserving Scheme for Android Applications
    CUI Haoliang
    SHAO Shuai
    NIU Shaozhang
    ZHANG Wen
    YUAN Yang
    ChineseJournalofElectronics, 2020, 29 (04) : 731 - 737