GenPipes: an open-source framework for distributed and scalable genomic analyses

被引:132
作者
Bourgey, Mathieu [1 ,2 ,3 ]
Dali, Rola [1 ,2 ,3 ]
Eveleigh, Robert [1 ,2 ,3 ]
Chen, Kuang Chung [4 ,5 ]
Letourneau, Louis [1 ,2 ,3 ]
Fillon, Joel [6 ]
Michaud, Marc [2 ,3 ]
Caron, Maxime [1 ,2 ,3 ,6 ]
Sandoval, Johanna [7 ]
Lefebvre, Francois [1 ,2 ,3 ]
Leveque, Gary [1 ,2 ,3 ]
Mercier, Eloi [1 ,2 ,3 ]
Bujold, David [1 ,2 ,3 ]
Marquis, Pascale [1 ,2 ,3 ]
Van, Patrick Tran [8 ]
Morais, David Anderson de Lima [9 ]
Tremblay, Julien [10 ]
Shao, Xiaojian [1 ,2 ,3 ]
Henrion, Edouard [1 ,2 ,3 ]
Gonzalez, Emmanuel [1 ,2 ,3 ]
Quirion, Pierre-Olivier [1 ,2 ,3 ]
Caron, Bryan [4 ,5 ]
Bourque, Guillaume [1 ,2 ,3 ,6 ]
机构
[1] Canadian Ctr Computat Genom, Montreal, PQ, Canada
[2] McGill Univ, Montreal, PQ, Canada
[3] Genome Quebec Innovat Ctr, Montreal, PQ, Canada
[4] McGill Univ, McGill HPC Ctr, Montreal, PQ, Canada
[5] Calcul Quebec, Montreal, PQ, Canada
[6] McGill Univ, Dept Human Genet, Montreal, PQ, Canada
[7] Beaulieu Saucier Univ Montreal, Pharmacogen Ctr, Montreal, PQ, Canada
[8] Univ Lausanne, Dept Ecol & Evolut, Lausanne, Switzerland
[9] Univ Sherbrooke, CCS, Sherbrooke, PQ, Canada
[10] Natl Res Council Canada, Energy Min & Environm, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
genomics; workflow management systems; frameworks; workflow; pipeline; bioinformatics; TOOL; DISCOVERY; PROFILES; ACCURATE; CALLER;
D O I
10.1093/gigascience/giz037
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. Findings: Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated and scalable pipelines for various genomics applications, including RNA sequencing, chromatin immunoprecipitation sequencing, DNA sequencing, methylation sequencing, Hi-C, capture Hi-C, metagenomics, and Pacific Biosciences long-read assembly. The software is available under a GPLv3 open source license and is continuously updated to follow recent advances in genomics and bioinformatics. The framework has already been configured on several servers, and a Docker image is also available to facilitate additional installations. Conclusions: GenPipes offers genomics researchers a simple method to analyze different types of data, customizable to their needs and resources, as well as the flexibility to create their own workflows.
引用
收藏
页数:11
相关论文
共 70 条
[51]   FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix [J].
Price, Morgan N. ;
Dehal, Paramvir S. ;
Arkin, Adam P. .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (07) :1641-1650
[52]   DELLY: structural variant discovery by integrated paired-end and split-read analysis [J].
Rausch, Tobias ;
Zichner, Thomas ;
Schlattl, Andreas ;
Stuetz, Adrian M. ;
Benes, Vladimir ;
Korbel, Jan O. .
BIOINFORMATICS, 2012, 28 (18) :I333-I339
[53]   GenePattern 2.0 [J].
Reich, M ;
Liefeld, T ;
Gould, J ;
Lerner, J ;
Tamayo, P ;
Mesirov, JP .
NATURE GENETICS, 2006, 38 (05) :500-501
[54]   edgeR: a Bioconductor package for differential expression analysis of digital gene expression data [J].
Robinson, Mark D. ;
McCarthy, Davis J. ;
Smyth, Gordon K. .
BIOINFORMATICS, 2010, 26 (01) :139-140
[55]   VSEARCH: a versatile open source tool for metagenomics [J].
Rognes, Torbjorn ;
Flouri, Tomas ;
Nichols, Ben ;
Quince, Christopher ;
Mahe, Frederic .
PEERJ, 2016, 4
[56]   Bpipe: a tool for running and managing bioinformatics pipelines [J].
Sadedin, Simon P. ;
Pope, Bernard ;
Oshlack, Alicia .
BIOINFORMATICS, 2012, 28 (11) :1525-1526
[57]   Variation in genomic landscape of clear cell renal cell carcinoma across Europe [J].
Scelo, Ghislaine ;
Riazalhosseini, Yasser ;
Greger, Liliana ;
Letourneau, Louis ;
Gonzalez-Porta, Mar ;
Wozniak, Magdalena B. ;
Bourgey, Mathieu ;
Harnden, Patricia ;
Egevad, Lars ;
Jackson, Sharon M. ;
Karimzadeh, Mehran ;
Arseneault, Madeleine ;
Lepage, Pierre ;
How-Kit, Alexandre ;
Daunay, Antoine ;
Renault, Victor ;
Blanche, Helene ;
Tubacher, Emmanuel ;
Sehmoun, Jeremy ;
Viksna, Juris ;
Celms, Edgars ;
Opmanis, Martins ;
Zarins, Andris ;
Vasudev, Naveen S. ;
Seywright, Morag ;
Abedi-Ardekani, Behnoush ;
Carreira, Christine ;
Selby, Peter J. ;
Cartledge, Jon J. ;
Byrnes, Graham ;
Zavadil, Jiri ;
Su, Jing ;
Holcatova, Ivana ;
Brisuda, Antonin ;
Zaridze, David ;
Moukeria, Anush ;
Foretova, Lenka ;
Navratilova, Marie ;
Mates, Dana ;
Jinga, Viorel ;
Artemov, Artem ;
Nedoluzhko, Artem ;
Mazur, Alexander ;
Rastorguev, Sergey ;
Boulygina, Eugenia ;
Heath, Simon ;
Gut, Marta ;
Bihoreau, Marie-Therese ;
Lechner, Doris ;
Foglio, Mario .
NATURE COMMUNICATIONS, 2014, 5
[58]   TopDom: an efficient and deterministic method for identifying topological domains in genomes [J].
Shin, Hanjun ;
Shi, Yi ;
Dai, Chao ;
Tjong, Harianto ;
Gong, Ke ;
Alber, Frank ;
Zhou, Xianghong Jasmine .
NUCLEIC ACIDS RESEARCH, 2016, 44 (07)
[59]  
Van der Auwera Geraldine A, 2013, Curr Protoc Bioinformatics, V43, DOI [10.1002/0471250953.bi1201s43, 10.1002/0471250953.bi1110s43]
[60]   The International Human Epigenome Consortium: A Blueprint for Scientific Collaboration and Discovery [J].
Stunnenberg, Hendrik G. ;
Hirst, Martin .
CELL, 2016, 167 (05) :1145-1149