GenomeVIP: a cloud platform for genomic variant discovery and interpretation

被引:11
作者
Mashl, R. Jay [1 ,2 ]
Scott, Adam D. [1 ,2 ]
Huang, Kuan-lin [1 ,2 ]
Wyczalkowski, Matthew A. [1 ]
Yoon, Christopher J. [1 ,2 ]
Niu, Beifang [1 ]
DeNardo, Erin [1 ]
Yellapantula, Venkata D. [1 ,2 ]
Handsaker, Robert E. [3 ,4 ]
Chen, Ken [5 ]
Koboldt, Daniel C. [1 ]
Ye, Kai [1 ,2 ]
Fenyo, David [6 ]
Raphael, Benjamin J. [7 ,8 ]
Wendl, Michael C. [1 ,9 ,10 ]
Ding, Li [1 ,2 ,9 ,11 ]
机构
[1] Washington Univ, McDonnell Genome Inst, St Louis, MO 63108 USA
[2] Washington Univ, Dept Med, Div Oncol, St Louis, MO 63108 USA
[3] Broad Inst, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[4] Harvard Med Sch, Dept Genet, Boston, MA 02115 USA
[5] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77030 USA
[6] NYU, Langone Med Ctr, New York, NY 10016 USA
[7] Brown Univ, Dept Comp Sci, Providence, RI 02912 USA
[8] Brown Univ, Ctr Computat Mol Biol, Providence, RI 02912 USA
[9] Washington Univ, Dept Genet, St Louis, MO 63108 USA
[10] Washington Univ, Dept Math, St Louis, MO 63108 USA
[11] Washington Univ, Siteman Canc Ctr, St Louis, MO 63108 USA
关键词
FRAMEWORK; BIOINFORMATICS; MUTATION;
D O I
10.1101/gr.211656.116
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional "download and analyze" paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets.
引用
收藏
页码:1450 / 1459
页数:10
相关论文
共 42 条
[1]  
Adzhubei Ivan, 2013, Curr Protoc Hum Genet, VChapter 7, DOI 10.1002/0471142905.hg0720s76
[2]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]  
[Anonymous], 2014, CURRENT PROTOCOLS BI
[5]   BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity [J].
Cantarel, Brandi L. ;
Weaver, Daniel ;
McNeill, Nathan ;
Zhang, Jianhua ;
Mackey, Aaron J. ;
Reese, Justin .
BMC BIOINFORMATICS, 2014, 15
[6]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[7]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[8]   A New Initiative on Precision Medicine [J].
Collins, Francis S. ;
Varmus, Harold .
NEW ENGLAND JOURNAL OF MEDICINE, 2015, 372 (09) :793-795
[9]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[10]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+