SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data

被引:184
作者
Wei, Zhi [1 ]
Wang, Wei [1 ]
Hu, Pingzhao [2 ]
Lyon, Gholson J. [3 ]
Hakonarson, Hakon [3 ]
机构
[1] New Jersey Inst Technol, Dept Comp Sci, Princeton, NJ 08540 USA
[2] Hosp Sick Children, TCAG, Toronto, ON M5G 1L7, Canada
[3] Univ Penn, Childrens Hosp Philadelphia, Dept Pediat, Ctr Appl Genom, Philadelphia, PA 19104 USA
关键词
GENOME-WIDE ASSOCIATION; LARGE-SCALE ASSOCIATION; HIGH-THROUGHPUT; RARE VARIANTS; COMPLEX TRAITS; DNA; DISCOVERY; DISEASES; IDENTIFICATION; FRAMEWORK;
D O I
10.1093/nar/gkr599
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.
引用
收藏
页数:13
相关论文
共 35 条
[1]   A statistical method for the detection of variants from next-generation resequencing of DNA pools [J].
Bansal, Vikas .
BIOINFORMATICS, 2010, 26 (12) :i318-i324
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Screening for Partial Conjunction Hypotheses [J].
Benjamini, Yoav ;
Heller, Ruth .
BIOMETRICS, 2008, 64 (04) :1215-1222
[4]   Common and rare variants in multifactorial susceptibility to common diseases [J].
Bodmer, Walter ;
Bonilla, Carolina .
NATURE GENETICS, 2008, 40 (06) :695-701
[5]   High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency [J].
Calvo, Sarah E. ;
Tucker, Elena J. ;
Compton, Alison G. ;
Kirby, Denise M. ;
Crawford, Gabriel ;
Burtt, Noel P. ;
Rivas, Manuel ;
Guiducci, Candace ;
Bruno, Damien L. ;
Goldberger, Olga A. ;
Redman, Michelle C. ;
Wiltshire, Esko ;
Wilson, Callum J. ;
Altshuler, David ;
Gabriel, Stacey B. ;
Daly, Mark J. ;
Thorburn, David R. ;
Mootha, Vamsi K. .
NATURE GENETICS, 2010, 42 (10) :851-+
[6]   Uncovering the roles of rare variants in common disease through whole-genome sequencing [J].
Cirulli, Elizabeth T. ;
Goldstein, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (06) :415-425
[7]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[8]  
Druley TE, 2009, NAT METHODS, V6, P263, DOI [10.1038/NMETH.1307, 10.1038/nmeth.1307]
[9]   Human genetic variation and its contribution to complex traits [J].
Frazer, Kelly A. ;
Murray, Sarah S. ;
Schork, Nicholas J. ;
Topol, Eric J. .
NATURE REVIEWS GENETICS, 2009, 10 (04) :241-251
[10]   International genome project launched [J].
Hayden, Erika Check .
NATURE, 2008, 451 (7177) :378-379