PROTEIN SECONDARY STRUCTURE PREDICTION USING NEAREST-NEIGHBOR METHODS

被引:117
|
作者
YI, TM [1 ]
LANDER, ES [1 ]
机构
[1] MIT,DEPT BIOL,CAMBRIDGE,MA 02142
关键词
NEAREST-NEIGHBOR ALGORITHM; SECONDARY STRUCTURE PREDICTION; ARTIFICIAL INTELLIGENCE; PROTEIN STRUCTURE;
D O I
10.1006/jmbi.1993.1464
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have studied the use of nearest-neighbor classifiers to predict the secondary structure of proteins. The nearest-neighbor rule states that a test instance is classified according to the classifications of "nearby" training examples from a database of known structures. In the context of secondary structure prediction, the test instances are windows of n consecutive residues, and the label is the secondary structure type (α-helix, β-strand, or coil) of the center position of the window. To define the neighborhood of a test instance, we employed a novel similarity metric based on the local structural environment scoring scheme of Bowie et al. In this manner, we have attempted to exploit the underlying structural similarity between segments of different proteins to aid in the prediction of secondary structure. Furthermore, in addition to using neighborhoods of fixed radius, we explored a modification of the standard nearest-neighbor algorithm that involved defining an "effective radius" for each exemplar by measuring its performance on a training set. Using these ideas, we achieved a peak prediction accuracy of 68%. Finally, we sought to improve the biological utility of secondary structure prediction by identifying the subset of the predictions that are most likely to be correct. Toward this end, we developed a nearest-neighbor estimator that produced not the traditional "one-state" prediction (α-helix, β-strand, or coil) but rather a probability distribution over the three states. It should be emphasized that this scheme estimates true probability values and that the resulting numbers are not pseudo-probability scores generated by simple normalization of the raw output of the predictor. Applying the mutual information statistic, we found that these probability triplets possess 58% more information than the one-state predictions. Furthermore, the probability estimates allow one to assign an a priori confidence level to the prediction at each residue. Using this approach, we found that the top 28% of the predictions were 86% accurate and the top 43% of the predictions were 81% accurate. These results indicate that, notwithstanding the limitations on overall accuracy of secondary structure prediction, a substantial proportion of a protein can be predicted with considerable accuracy.
引用
收藏
页码:1117 / 1129
页数:13
相关论文
共 50 条
  • [1] PREDICTING PROTEIN SECONDARY STRUCTURE WITH A NEAREST-NEIGHBOR ALGORITHM
    SALZBERG, S
    COST, S
    JOURNAL OF MOLECULAR BIOLOGY, 1992, 227 (02) : 371 - 374
  • [2] Protein β-turn prediction using nearest-neighbor method
    Kim, S
    BIOINFORMATICS, 2004, 20 (01) : 40 - 44
  • [3] PREDICTION OF PROTEIN SECONDARY STRUCTURE BY COMBINING NEAREST-NEIGHBOR ALGORITHMS AND MULTIPLE SEQUENCE ALIGNMENTS
    SALAMOV, AA
    SOLOVYEV, VV
    JOURNAL OF MOLECULAR BIOLOGY, 1995, 247 (01) : 11 - 15
  • [4] Nearest-neighbor methods
    Sutton, Clifton
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (03): : 307 - 309
  • [5] Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction
    Kishore J Doshi
    Jamie J Cannone
    Christian W Cobaugh
    Robin R Gutell
    BMC Bioinformatics, 5
  • [6] Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction
    Doshi, KJ
    Cannone, JJ
    Cobaugh, CW
    Gutell, RR
    BMC BIOINFORMATICS, 2004, 5 (1)
  • [7] A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more
    Rivas, Elena
    Lang, Raymond
    Eddy, Sean R.
    RNA, 2012, 18 (02) : 193 - 212
  • [8] Ensemble numeric prediction of nearest-neighbor learning
    He L.
    Song Q.
    Shen J.
    Hai Z.
    Information Technology Journal, 2010, 9 (03) : 535 - 544
  • [9] SIMULATION STUDIES WITH LATTICE AND NEAREST-NEIGHBOR METHODS
    NISSEN, O
    BIOMETRICS, 1985, 41 (04) : 1087 - 1087
  • [10] AN EVALUATION OF NEAREST-NEIGHBOR METHODS FOR TAG REFINEMENT
    Uricchio, Tiberio
    Ballan, Lamberto
    Bertini, Marco
    Del Bimbo, Alberto
    2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2013), 2013,