GenPipes: an open-source framework for distributed and scalable genomic analyses

被引:131
作者
Bourgey, Mathieu [1 ,2 ,3 ]
Dali, Rola [1 ,2 ,3 ]
Eveleigh, Robert [1 ,2 ,3 ]
Chen, Kuang Chung [4 ,5 ]
Letourneau, Louis [1 ,2 ,3 ]
Fillon, Joel [6 ]
Michaud, Marc [2 ,3 ]
Caron, Maxime [1 ,2 ,3 ,6 ]
Sandoval, Johanna [7 ]
Lefebvre, Francois [1 ,2 ,3 ]
Leveque, Gary [1 ,2 ,3 ]
Mercier, Eloi [1 ,2 ,3 ]
Bujold, David [1 ,2 ,3 ]
Marquis, Pascale [1 ,2 ,3 ]
Van, Patrick Tran [8 ]
Morais, David Anderson de Lima [9 ]
Tremblay, Julien [10 ]
Shao, Xiaojian [1 ,2 ,3 ]
Henrion, Edouard [1 ,2 ,3 ]
Gonzalez, Emmanuel [1 ,2 ,3 ]
Quirion, Pierre-Olivier [1 ,2 ,3 ]
Caron, Bryan [4 ,5 ]
Bourque, Guillaume [1 ,2 ,3 ,6 ]
机构
[1] Canadian Ctr Computat Genom, Montreal, PQ, Canada
[2] McGill Univ, Montreal, PQ, Canada
[3] Genome Quebec Innovat Ctr, Montreal, PQ, Canada
[4] McGill Univ, McGill HPC Ctr, Montreal, PQ, Canada
[5] Calcul Quebec, Montreal, PQ, Canada
[6] McGill Univ, Dept Human Genet, Montreal, PQ, Canada
[7] Beaulieu Saucier Univ Montreal, Pharmacogen Ctr, Montreal, PQ, Canada
[8] Univ Lausanne, Dept Ecol & Evolut, Lausanne, Switzerland
[9] Univ Sherbrooke, CCS, Sherbrooke, PQ, Canada
[10] Natl Res Council Canada, Energy Min & Environm, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
genomics; workflow management systems; frameworks; workflow; pipeline; bioinformatics; TOOL; DISCOVERY; PROFILES; ACCURATE; CALLER;
D O I
10.1093/gigascience/giz037
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. Findings: Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations. Conclusions: GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.
引用
收藏
页数:11
相关论文
共 70 条
[1]   The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update [J].
Afgan, Enis ;
Baker, Dannon ;
van den Beek, Marius ;
Blankenberg, Daniel ;
Bouvier, Dave ;
Cech, Martin ;
Chilton, John ;
Clements, Dave ;
Coraor, Nate ;
Eberhard, Carl ;
Gruening, Bjoern ;
Guerler, Aysam ;
Hillman-Jackson, Jennifer ;
Von Kuster, Greg ;
Rasche, Eric ;
Soranzo, Nicola ;
Turaga, Nitesh ;
Taylor, James ;
Nekrutenko, Anton ;
Goecks, Jeremy .
NUCLEIC ACIDS RESEARCH, 2016, 44 (W1) :W3-W10
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[4]   Contribution to Alzheimer's disease risk of rare variants in TREM2, SORL1, and ABCA7 in 1779 cases and 1273 controls [J].
Bellenguez, Celine ;
Charbonnier, Camille ;
Grenier-Boley, Benjamin ;
Quenez, Olivier ;
Le Guennec, Kilan ;
Nicolas, Gael ;
Chauhan, Ganesh ;
Wallon, David ;
Rousseau, Stephane ;
Richard, Anne Claire ;
Boland, Anne ;
Bourque, Guillaume ;
Munter, Hans Markus ;
Olaso, Robert ;
Meyer, Vincent ;
Rollin-Sillaire, Adeline ;
Pasquier, Florence ;
Letenneur, Luc ;
Redon, Richard ;
Dartigues, Jean-Francois ;
Tzourio, Christophe ;
Frebourg, Thierry ;
Lathrop, Mark ;
Deleuze, Jean-Francois ;
Hannequin, Didier ;
Genin, Emmanuelle ;
Amouyel, Philippe ;
Debette, Stephanie ;
Lambert, Jean-Charles ;
Campion, Dominique .
NEUROBIOLOGY OF AGING, 2017, 59 :220.e1-220.e9
[5]  
Bourgey M, 2019, GIGASCIENCE DATABASE, DOI [10.5524/100575, DOI 10.5524/100575]
[6]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[7]   eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data [J].
Breeze, Charles E. ;
Paul, Dirk S. ;
van Dongen, Jenny ;
Butcher, Lee M. ;
Ambrose, John C. ;
Barrett, James E. ;
Lowe, Robert ;
Rakyan, Vardhman K. ;
Iotchkova, Valentina ;
Frontini, Mattia ;
Downes, Kate ;
Ouwehand, Willem H. ;
Laperle, Jonathan ;
Jacques, Pierre-ETienne ;
Bourque, Guillaume ;
Bergmann, Anke K. ;
Siebert, Reiner ;
Vellenga, Edo ;
Saeed, Sadia ;
Matarese, Filomena ;
Martens, Joost H. A. ;
Stunnenberg, Hendrik G. ;
Teschendorff, Andrew E. ;
Herrero, Javier ;
Birney, Ewan ;
Dunham, Ian ;
Beck, Stephan .
CELL REPORTS, 2016, 17 (08) :2137-2150
[8]   Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecular subgroups and recurrent activating ACVR1 mutations [J].
Buczkowicz, Pawel ;
Hoeman, Christine ;
Rakopoulos, Patricia ;
Pajovic, Sanja ;
Letourneau, Louis ;
Dzamba, Misko ;
Morrison, Andrew ;
Lewis, Peter ;
Bouffet, Eric ;
Bartels, Ute ;
Zuccaro, Jennifer ;
Agnihotri, Sameer ;
Rya, Scott ;
Barszczyk, Mark ;
Chornenkyy, Yevgen ;
Bourgey, Mathieu ;
Bourque, Guillaume ;
Montpetit, Alexandre ;
Cordero, Francisco ;
Castelo-Branco, Pedro ;
Mangere, Joshua ;
Tabori, Uri ;
Ching, King ;
Huang, Annie ;
Taylor, Kathryn R. ;
Mackay, Alan ;
Bendell, Anne E. ;
Nazarian, Javad ;
Fangusaro, Jason R. ;
Karajannis, Matthias A. ;
Zagzag, David ;
Foreman, Nicholas K. ;
Donson, Andrew ;
Hegert, Julia V. ;
Smith, Amy ;
Chan, Jennifer ;
Lafay-Cousin, Lucy ;
Dunn, Sandra ;
Hukin, Juliette ;
Dunham, Chris ;
Scheinemann, Katrin ;
Michaud, Jean ;
Zelcer, Shayna ;
Ramsay, David ;
Cain, Jason ;
Brennan, Cameron ;
Souweidane, Mark M. ;
Jones, Chris ;
Allis, C. David ;
Brudno, Michael .
NATURE GENETICS, 2014, 46 (05) :451-456
[9]   CernVM - a virtual software appliance for LHC applications [J].
Buncic, P. ;
Sanchez, C. Aguado ;
Blomer, J. ;
Franco, L. ;
Harutyunian, A. ;
Mato, P. ;
Yao, Y. .
17TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP09), 2010, 219
[10]   NGSANE: a lightweight production informatics framework for high-throughput data analysis [J].
Buske, Fabian A. ;
French, Hugh J. ;
Smith, Martin A. ;
Clark, Susan J. ;
Bauer, Denis C. .
BIOINFORMATICS, 2014, 30 (10) :1471-1472