VAAST 2.0: Improved Variant Classification and Disease-Gene Identification Using a Conservation-Controlled Amino Acid Substitution Matrix

被引:95
作者
Hu, Hao [1 ]
Huff, Chad D. [1 ]
Moore, Barry [2 ]
Flygare, Steven [2 ]
Reese, Martin G. [3 ]
Yandell, Mark [2 ]
机构
[1] Univ Texas MD Anderson Canc Ctr, Dept Epidemiol, Houston, TX 77030 USA
[2] Univ Utah, Sch Med, Eccles Inst Human Genet, Dept Human Genet, Salt Lake City, UT USA
[3] Omicia Inc, Emeryville, CA USA
关键词
disease-gene finder; variant classifier; aggregative association test; rare Mendelian disease; complex disease; MISSENSE SUBSTITUTIONS; COMMON DISEASES; RARE VARIANTS; ASSOCIATION; BRCA1;
D O I
10.1002/gepi.21743
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The need for improved algorithmic support for variant prioritization and disease-gene identification in personal genomes data is widely acknowledged. We previously presented the Variant Annotation, Analysis, and Search Tool (VAAST), which employs an aggregative variant association test that combines both amino acid substitution (AAS) and allele frequencies. Here we describe and benchmark VAAST 2.0, which uses a novel conservation-controlled AAS matrix (CASM), to incorporate information about phylogenetic conservation. We show that the CASM approach improves VAAST's variant prioritization accuracy compared to its previous implementation, and compared to SIFT, PolyPhen-2, and MutationTaster. We also show that VAAST 2.0 outperforms KBAC, WSS, SKAT, and variable threshold (VT) using published case-control datasets for Crohn disease (NOD2), hypertriglyceridemia (LPL), and breast cancer (CHEK2). VAAST 2.0 also improves search accuracy on simulated datasets across a wide range of allele frequencies, population-attributable disease risks, and allelic heterogeneity, factors that compromise the accuracies of other aggregative variant association tests. We also demonstrate that, although most aggregative variant association tests are designed for common genetic diseases, these tests can be easily adopted as rare Mendelian disease-gene finders with a simple ranking-by-statistical-significance protocol, and the performance compares very favorably to state-of-art filtering approaches. The latter, despite their popularity, have suboptimal performance especially with the increasing case sample size.
引用
收藏
页码:622 / 634
页数:13
相关论文
共 31 条
[1]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   The human gene mutation database [J].
Cooper, DN ;
Ball, EV ;
Krawczak, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :285-287
[4]   Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays [J].
Drmanac, Radoje ;
Sparks, Andrew B. ;
Callow, Matthew J. ;
Halpern, Aaron L. ;
Burns, Norman L. ;
Kermani, Bahram G. ;
Carnevali, Paolo ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Yeung, George ;
Dahl, Fredrik ;
Fernandez, Andres ;
Staker, Bryan ;
Pant, Krishna P. ;
Baccash, Jonathan ;
Borcherding, Adam P. ;
Brownley, Anushka ;
Cedeno, Ryan ;
Chen, Linsu ;
Chernikoff, Dan ;
Cheung, Alex ;
Chirita, Razvan ;
Curson, Benjamin ;
Ebert, Jessica C. ;
Hacker, Coleen R. ;
Hartlage, Robert ;
Hauser, Brian ;
Huang, Steve ;
Jiang, Yuan ;
Karpinchyk, Vitali ;
Koenig, Mark ;
Kong, Calvin ;
Landers, Tom ;
Le, Catherine ;
Liu, Jia ;
McBride, Celeste E. ;
Morenzoni, Matt ;
Morey, Robert E. ;
Mutch, Karl ;
Perazich, Helena ;
Perry, Kimberly ;
Peters, Brock A. ;
Peterson, Joe ;
Pethiyagoda, Charit L. ;
Pothuraju, Kaliprasad ;
Richter, Claudia ;
Rosenbaum, Abraham M. ;
Roy, Shaunak ;
Shafto, Jay ;
Sharanhovich, Uladzislau .
SCIENCE, 2010, 327 (5961) :78-81
[5]   A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes [J].
Easton, Douglas F. ;
Deffenbaugh, Amie M. ;
Pruss, Dmitry ;
Frye, Cynthia ;
Wenstrup, Richard J. ;
Allen-Brady, Kristina ;
Tavtigian, Sean V. ;
Monteiro, Alvaro N. A. ;
Iversen, Edwin S. ;
Couch, Fergus J. ;
Goldgar, David E. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :873-883
[6]   AMINO-ACID DIFFERENCE FORMULA TO HELP EXPLAIN PROTEIN EVOLUTION [J].
GRANTHAM, R .
SCIENCE, 1974, 185 (4154) :862-864
[7]   Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia [J].
Johansen, Christopher T. ;
Wang, Jian ;
Lanktree, Matthew B. ;
Cao, Henian ;
McIntyre, Adam D. ;
Ban, Matthew R. ;
Martins, Rebecca A. ;
Kennedy, Brooke A. ;
Hassell, Reina G. ;
Visser, Maartje E. ;
Schwartz, Stephen M. ;
Voight, Benjamin F. ;
Elosua, Roberto ;
Salomaa, Veikko ;
O'Donnell, Christopher J. ;
Dallinga-Thie, Geesje M. ;
Anand, Sonia S. ;
Yusuf, Salim ;
Huff, Murray W. ;
Kathiresan, Sekar ;
Hegele, Robert A. .
NATURE GENETICS, 2010, 42 (08) :684-+
[8]   The UCSC Table Browser data retrieval tool [J].
Karolchik, D ;
Hinrichs, AS ;
Furey, TS ;
Roskin, KM ;
Sugnet, CW ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D493-D496
[9]   Most rare missense alleles are deleterious in humans: Implications for complex disease and association studies [J].
Kryukov, Gregory V. ;
Pennacchio, Len A. ;
Sunyaev, Shamil R. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 80 (04) :727-739
[10]   The Empirical Power of Rare Variant Association Methods: Results from Sanger Sequencing in 1,998 Individuals [J].
Ladouceur, Martin ;
Dastani, Zari ;
Aulchenko, Yurii S. ;
Greenwood, Celia M. T. ;
Richards, J. Brent .
PLOS GENETICS, 2012, 8 (02)