SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences

被引:46
作者
Wong, Ka-Chun [1 ,2 ]
Zhang, Zhaolei [1 ,2 ,3 ,4 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[2] Univ Toronto, Donnelly Ctr Cellular & Biomol Res, Toronto, ON M5S 3E1, Canada
[3] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON M5S 3E1, Canada
[4] Univ Toronto, Dept Mol Genet, Toronto, ON M5S 1A8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GENETIC-VARIATION; EVOLUTION; SELECTION; ALIGNMENT; DATABASE; SNVS;
D O I
10.1093/bioinformatics/btt769
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance. Results: We have developed a novel prediction algorithm, named SNPdryad, which only includes protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes and uses Random Forest as a classifier. We have tested SNPdryad on several datasets. We found that SNPdryad consistently outperformed other methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions.
引用
收藏
页码:1112 / 1119
页数:8
相关论文
共 50 条
[31]   Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies [J].
Li, Miao-Xin ;
Kwan, Johnny S. H. ;
Bao, Su-Ying ;
Yang, Wanling ;
Ho, Shu-Leong ;
Song, Yong-Qiang ;
Sham, Pak C. .
PLOS GENETICS, 2013, 9 (01)
[32]   A comprehensive framework for prioritizing variants in exome sequencing studies of Mendelian diseases [J].
Li, Miao-Xin ;
Gui, Hong-Sheng ;
Kwan, Johnny S. H. ;
Bao, Su-Ying ;
Sham, Pak C. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (07) :e53
[33]   Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes [J].
Lin, Michael F. ;
Kheradpour, Pouya ;
Washietl, Stefan ;
Parker, Brian J. ;
Pedersen, Jakob S. ;
Kellis, Manolis .
GENOME RESEARCH, 2011, 21 (11) :1916-1928
[34]   dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations [J].
Liu, Xiaoming ;
Jian, Xueqiu ;
Boerwinkle, Eric .
HUMAN MUTATION, 2013, 34 (09) :E2393-E2402
[35]   Proportionally more deleterious genetic variation in European than in African populations [J].
Lohmueller, Kirk E. ;
Indap, Amit R. ;
Schmidt, Steffen ;
Boyko, Adam R. ;
Hernandez, Ryan D. ;
Hubisz, Melissa J. ;
Sninsky, John J. ;
White, Thomas J. ;
Sunyaev, Shamil R. ;
Nielsen, Rasmus ;
Clark, Andrew G. ;
Bustamante, Carlos D. .
NATURE, 2008, 451 (7181) :994-U5
[36]   UniProt Knowledgebase: a hub of integrated protein data [J].
Magrane, Michele .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2011,
[37]   A family of evolution-entropy hybrid methods for ranking protein residues by importance [J].
Mihalek, I ;
Res, I ;
Lichtarge, O .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (05) :1265-1282
[38]   SIFT: predicting amino acid changes that affect protein function [J].
Ng, PC ;
Henikoff, S .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3812-3814
[39]   InParanoid 7: new algorithms and tools for eukaryotic orthology analysis [J].
Ostlund, Gabriel ;
Schmitt, Thomas ;
Forslund, Kristoffer ;
Kostler, Tina ;
Messina, David N. ;
Roopra, Sanjit ;
Frings, Oliver ;
Sonnhammer, Erik L. L. .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D196-D203
[40]  
Reichert J, 2002, NUCLEIC ACIDS RES, V30, P253, DOI 10.1093/nar/30.1.253