CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language

被引:36
作者
Kotliar, Michael [1 ,2 ]
Kartashov, Andrey V. [1 ,2 ]
Barski, Artem [1 ,2 ,3 ]
机构
[1] Univ Cincinnati, Div Allergy & Immunol, Cincinnati Childrens Hosp Med Ctr, Cincinnati, OH USA
[2] Univ Cincinnati, Coll Med, Dept Pediat, Cincinnati, OH USA
[3] Univ Cincinnati, Div Human Genet, Cincinnati Childrens Hosp Med Ctr, Cincinnati, OH USA
基金
美国国家卫生研究院;
关键词
Common Workflow Language; workflow manager; pipeline manager; Airflow; reproducible data analysis; workflow portability; CHIP-SEQ;
D O I
10.1093/gigascience/giz084
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Massive growth in the amount of research data and computational analysis has led to increased use of pipeline managers in biomedical computational research. However, each of the > 100 such managers uses its own way to describe pipelines, leading to difficulty porting workflows to different environments and therefore poor reproducibility of computational studies. For this reason, the Common Workflow Language (CWL) was recently introduced as a specification for platform-independent workflow description, and work began to transition existing pipelines and workflow managers to CWL. Findings: Herein, we present CWL-Airflow, a package that adds support for CWL to the Apache Airflow pipeline manager. CWL-Airflow uses CWL version 1.0 specification and can run workflows on stand-alone MacOS/Linux servers, on clusters, or on a variety of cloud platforms. A sample CWL pipeline for processing of chromatin immunoprecipitation sequencing data is provided. Conclusions: CWL-Airflow will provide users with the features of a fully fledged pipeline manager and the ability to execute CWL workflows anywhere Airflow can run-from a laptop to a cluster or cloud environment. CWL-Airflow is available under Apache License, version 2.0 (Apache-2.0), and can be downloaded from https://barski-lab.github.io/cwl-airflow, https://scicrunch.org/resolver/RRID: SCR 017196.
引用
收藏
页数:8
相关论文
共 18 条
[1]  
[Anonymous], 2016, PAC S BIOCOMPUT
[2]  
[Anonymous], 2016, COMMON WORKFLOW LANG
[3]   BamTools: a C++ API and toolkit for analyzing and managing BAM files [J].
Barnett, Derek W. ;
Garrison, Erik K. ;
Quinlan, Aaron R. ;
Stroemberg, Michael P. ;
Marth, Gabor T. .
BIOINFORMATICS, 2011, 27 (12) :1691-1692
[4]   Galaxy: A platform for interactive large-scale genome analysis [J].
Giardine, B ;
Riemer, C ;
Hardison, RC ;
Burhans, R ;
Elnitski, L ;
Shah, P ;
Zhang, Y ;
Blankenberg, D ;
Albert, I ;
Taylor, J ;
Miller, W ;
Kent, WJ ;
Nekrutenko, A .
GENOME RESEARCH, 2005, 15 (10) :1451-1455
[5]  
Hindman B., 2011, P 8 USENIX C NETWORK, P22
[6]   Super-Enhancers in the Control of Cell Identity and Disease [J].
Hnisz, Denes ;
Abraham, Brian J. ;
Lee, Tong Ihn ;
Lau, Ashley ;
Saint-Andre, Violaine ;
Sigova, Alla A. ;
Hoke, Heather A. ;
Young, Richard A. .
CELL, 2013, 155 (04) :934-947
[7]   Xenbase: a genomic, epigenomic and transcriptomic model organism database [J].
Karimi, Kamran ;
Fortriede, Joshua D. ;
Lotay, Vaneet S. ;
Burns, Kevin A. ;
Wang, Dong Zhou ;
Fisher, Malcom E. ;
Pells, Troy J. ;
James-Zorn, Christina ;
Wang, Ying ;
Ponferrada, V. G. ;
Chu, Stanley ;
Chaturvedi, Praneet ;
Zorn, Aaron M. ;
Vize, Peter D. .
NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) :D861-D868
[8]   BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data [J].
Kartashov, Andrey V. ;
Barski, Artem .
GENOME BIOLOGY, 2015, 16
[9]  
Kotliar M, 2019, GIGASCIENCE DATABASE, DOI [10.5524/100618, DOI 10.5524/100618]
[10]   Singularity: Scientific containers for mobility of compute [J].
Kurtzer, Gregory M. ;
Sochat, Vanessa ;
Bauer, Michael W. .
PLOS ONE, 2017, 12 (05)