BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity

被引:40
作者
Cantarel, Brandi L. [1 ]
Weaver, Daniel [2 ]
McNeill, Nathan [1 ]
Zhang, Jianhua [3 ]
Mackey, Aaron J. [4 ]
Reese, Justin [2 ]
机构
[1] Baylor Inst Immunol Res, Baylor Hlth, Dallas, TX 75204 USA
[2] Genformatic LLC, Austin, TX 78731 USA
[3] Univ Texas MD Anderson Canc Ctr, Inst Appl Canc Sci, Houston, TX 77030 USA
[4] Univ Virginia, Sch Med, Ctr Publ Hlth Genom, Charlottesville, VA 22908 USA
关键词
SNP; Genome variants; Bayesian; Latent class analysis; Cancer; Somatic mutation; CANCER; EXOME; MUTATIONS; DISCOVERY; FRAMEWORK; FORMAT; SAMPLE;
D O I
10.1186/1471-2105-15-104
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate genomic variant detection is an essential step in gleaning medically useful information from genome data. However, low concordance among variant-calling methods reduces confidence in the clinical validity of whole genome and exome sequence data, and confounds downstream analysis for applications in genome medicine. Here we describe BAYSIC (BAYeSian Integrated Caller), which combines SNP variant calls produced by different methods (e. g. GATK, FreeBayes, Atlas, SamTools, etc.) into a more accurate set of variant calls. BAYSIC differs from majority voting, consensus or other ad hoc intersection-based schemes for combining sets of genome variant calls. Unlike other classification methods, the underlying BAYSIC model does not require training using a "gold standard" of true positives. Rather, with each new dataset, BAYSIC performs an unsupervised, fully Bayesian latent class analysis to estimate false positive and false negative error rates for each input method. The user specifies a posterior probability threshold according to the user's tolerance for false positive and false negative errors; lowering the posterior probability threshold allows the user to trade specificity for sensitivity while raising the threshold increases specificity in exchange for sensitivity. Results: We assessed the performance of BAYSIC in comparison to other variant detection methods using ten low coverage (similar to 5X) samples from The 1000 Genomes Project, a tumor/normal exome pair (40X), and exome sequences (40X) from positive control samples previously identified to contain clinically relevant SNPs. We demonstrated BAYSIC's superior variant-calling accuracy, both for somatic mutation detection and germline variant detection. Conclusions: BAYSIC provides a method for combining sets of SNP variant calls produced by different variant calling programs. The integrated set of SNP variant calls produced by BAYSIC improves the sensitivity and specificity of the variant calls used as input. In addition to combining sets of germline variants, BAYSIC can also be used to combine sets of somatic mutations detected in the context of tumor/normal sequencing experiments.
引用
收藏
页数:12
相关论文
共 28 条
[1]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[2]   Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities [J].
Bainbridge, Matthew N. ;
Wang, Min ;
Wu, Yuanqing ;
Newsham, Irene ;
Muzny, Donna M. ;
Jefferies, John L. ;
Albert, Thomas J. ;
Burgess, Daniel L. ;
Gibbs, Richard A. .
GENOME BIOLOGY, 2011, 12 (07)
[3]   An integrative variant analysis suite for whole exome next-generation sequencing data [J].
Challis, Danny ;
Yu, Jin ;
Evani, Uday S. ;
Jackson, Andrew R. ;
Paithankar, Sameer ;
Coarfa, Cristian ;
Milosavljevic, Aleksandar ;
Gibbs, Richard A. ;
Yu, Fuli .
BMC BIOINFORMATICS, 2012, 13
[4]   Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes [J].
Chen, Feng ;
Mackey, Aaron J. ;
Vermunt, Jeroen K. ;
Roos, David S. .
PLOS ONE, 2007, 2 (04)
[5]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[6]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[7]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[8]   Clinical Interpretation and Implications of Whole-Genome Sequencing [J].
Dewey, Frederick E. ;
Grove, Megan E. ;
Pan, Cuiping ;
Goldstein, Benjamin A. ;
Bernstein, Jonathan A. ;
Chaib, Hassan ;
Merker, Jason D. ;
Goldfeder, Rachel L. ;
Enns, Gregory M. ;
David, Sean P. ;
Pakdaman, Neda ;
Ormond, Kelly E. ;
Caleshu, Colleen ;
Kingham, Kerry ;
Klein, Teri E. ;
Whirl-Carrillo, Michelle ;
Sakamoto, Kenneth ;
Wheeler, Matthew T. ;
Butte, Atul J. ;
Ford, James M. ;
Boxer, Linda ;
Ioannidis, John P. A. ;
Yeung, Alan C. ;
Altman, Russ B. ;
Assimes, Themistocles L. ;
Snyder, Michael ;
Ashley, Euan A. ;
Quertermous, Thomas .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2014, 311 (10) :1035-1044
[9]   Genomewide comparison of DNA sequences between humans and chimpanzees [J].
Ebersberger, I ;
Metzler, D ;
Schwarz, C ;
Pääbo, S .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (06) :1490-1497
[10]   Creating a honey bee consensus gene set [J].
Elsik, Christine G. ;
Mackey, Aaron J. ;
Reese, Justin T. ;
Milshina, Natalia V. ;
Roos, David S. ;
Weinstock, George M. .
GENOME BIOLOGY, 2007, 8 (01)