Functional Annotations Improve the Predictive Score of Human Disease-Related Mutations in Proteins

被引:496
作者
Calabrese, Remo [1 ]
Capriotti, Emidio [1 ]
Fariselli, Piero [1 ]
Martelli, Pier Luigi [1 ]
Casadio, Rita [1 ]
机构
[1] Univ Bologna, Dept Biol, CIRB, Lab Biocomp, I-40126 Bologna, Italy
关键词
missense mutation; support vector machine; Gene Ontology; disease-related SNP; SINGLE-NUCLEOTIDE POLYMORPHISMS; GENE ONTOLOGY; SEQUENCE; DATABASE; SNPS; IDENTIFICATION; INFORMATION; TERMS; TOOL;
D O I
10.1002/humu.21047
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Single nucleotide polymorphisms (SNPs) are the simplest and most frequent form of human DNA variation, also valuable as genetic markers of disease susceptibility. The most investigated SNPs are missense mutations resulting in residue substitutions in the protein. Here we propose SNPs&GO, an accurate method that, starting from a protein sequence, can predict whether a mutation is disease related or not by exploiting the protein functional annotation. The scoring efficiency of SNPs&GO is as high as 82%, with a Matthews correlation coefficient equal to 0.63 over a wide set of annotated nonsynonymous mutations in proteins, including 16,330 disease-related and 17,432 neutral polymorphisms. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods. Hum Mutat 30, 1237-1244, 2009. (C) 2009 Wiley-Liss, Inc.
引用
收藏
页码:1237 / 1244
页数:8
相关论文
共 55 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Selective pressures at a codon-level predict deleterious mutations in human disease genes [J].
Arbiza, Leonardo ;
Duchi, Serena ;
Montaner, David ;
Burguet, Jordi ;
Pantoja-Uceda, David ;
Pineda-Lucena, Antonio ;
Dopazo, Joaquin ;
Dopazo, Hernan .
JOURNAL OF MOLECULAR BIOLOGY, 2006, 358 (05) :1390-1404
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[5]   nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms [J].
Bao, L ;
Zhou, M ;
Cui, Y .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W480-W482
[6]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[7]   Africans and Asians abroad: Genetic diversity in Europe [J].
Barbujani, G ;
Goldstein, DB .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2004, 5 :119-150
[8]   Predicting disease using genomics [J].
Bell, J .
NATURE, 2004, 429 (6990) :453-456
[9]  
Bishop C. M., 2009, Pattern Recognition and Machine Learning
[10]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370