Prediction of phenotypes of missense mutations in human proteins from biological assemblies

被引:19
作者
Wei, Qiong [1 ]
Xu, Qifang [1 ]
Dunbrack, Roland L., Jr. [1 ]
机构
[1] Fox Chase Canc Ctr, Inst Canc Res, Philadelphia, PA 19111 USA
关键词
missense mutations; phenotype prediction; protein structure; biological assemblies; machine learning; SINGLE NUCLEOTIDE POLYMORPHISMS; MULTIPLE SEQUENCE ALIGNMENT; AMINO-ACID POLYMORPHISMS; DISEASE; INFERENCE; MUSCLE; SERVER;
D O I
10.1002/prot.24176
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Proteins 2013. (C) 2012 Wiley Periodicals, Inc.
引用
收藏
页码:199 / 213
页数:15
相关论文
共 38 条
[1]   Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
[4]   nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms [J].
Bao, L ;
Zhou, M ;
Cui, Y .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W480-W482
[5]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[6]   Functional Annotations Improve the Predictive Score of Human Disease-Related Mutations in Proteins [J].
Calabrese, Remo ;
Capriotti, Emidio ;
Fariselli, Piero ;
Martelli, Pier Luigi ;
Casadio, Rita .
HUMAN MUTATION, 2009, 30 (08) :1237-1244
[7]   Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information [J].
Capriotti, E. ;
Calabrese, R. ;
Casadio, R. .
BIOINFORMATICS, 2006, 22 (22) :2729-2734
[8]   The human gene mutation database [J].
Cooper, DN ;
Ball, EV ;
Krawczak, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :285-287
[9]   Protein-Protein Interaction Sites are Hot Spots for Disease-Associated Nonsynonymous SNPs [J].
David, Alessia ;
Razali, Rozami ;
Wass, Mark N. ;
Sternberg, Michael J. E. .
HUMAN MUTATION, 2012, 33 (02) :359-363
[10]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797