How well can the accuracy of comparative protein structure models be predicted?

被引:108
作者
Eramian, David [1 ,2 ,3 ,4 ]
Eswar, Narayanan [1 ,3 ,4 ]
Shen, Min-Yi [1 ,3 ,4 ]
Sali, Andrej [1 ,3 ,4 ]
机构
[1] Univ Calif San Francisco, Dept Bioengn & Therapeut Sci, San Francisco, CA 94158 USA
[2] Univ Calif San Francisco, Graduate Grp Biophys, San Francisco, CA 94158 USA
[3] Univ Calif San Francisco, Dept Pharmaceut Chem, San Francisco, CA 94158 USA
[4] Calif Inst Quantitat Biosci, San Francisco, CA 94158 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
D O I
10.1110/ps.036061.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the C alpha root-mean-squared deviation (RMSD) and native overlap (NO3.5 angstrom) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5 angstrom errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.
引用
收藏
页码:1881 / 1893
页数:13
相关论文
共 73 条
[1]  
ALBECK M J, 1990, Ugeskrift for Laeger, V152, P1650
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[4]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
[5]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[6]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   Computational prediction of structure, substrate binding mode, mechanism, and rate for a malaria protease with a novel type of active site [J].
Bjelic, S ;
Åqvist, J .
BIOCHEMISTRY, 2004, 43 (46) :14521-14528
[8]   Toward high-resolution de novo structure prediction for small proteins [J].
Bradley, P ;
Misura, KMS ;
Baker, D .
SCIENCE, 2005, 309 (5742) :1868-1871
[9]   Homology modeling and SAR analysis of Schistosoma japonicum cathepsin D (SjCD). with statin inhibitors identify a unique active site steric barrier with potential for the design of specific inhibitors [J].
Caffrey, CR ;
Placha, L ;
Barinka, C ;
Hradilek, M ;
Dostál, J ;
Sajid, M ;
McKerrow, JH ;
Majer, P ;
Konvalinka, J ;
Vondrásek, J .
BIOLOGICAL CHEMISTRY, 2005, 386 (04) :339-349
[10]   Systematic analysis of added-value in simple comparative models of protein structure [J].
Chakravarty, S ;
Sanchez, R .
STRUCTURE, 2004, 12 (08) :1461-1470