Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

被引:16
作者
Cuff, JA
Barton, GJ
机构
[1] European Mol Biol Lab, European Bioinformat Inst, European Mol Biol Lab Outstn, Cambridge CB10 1SD, England
[2] Univ Oxford, Mol Biophys Lab, Oxford OX1 3QU, England
关键词
protein; secondary structure prediction; combination of methods; benchmarks;
D O I
10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PBD, NNSSP, and PREDATOR, The maximum theoretical Q(3) accuracy for combination of these, methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q(3) prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508-519. (C) 1999 Wiley-Liss, Inc.
引用
收藏
页码:508 / 519
页数:12
相关论文
共 84 条
[1]   STRUCTURAL FEATURES OF AZURIN AT 2.7 A-RESOLUTION [J].
ADMAN, ET ;
JENSEN, LH .
ISRAEL JOURNAL OF CHEMISTRY, 1981, 21 (01) :8-12
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   STRUCTURE OF CALMODULIN REFINED AT 2.2 A RESOLUTION [J].
BABU, YS ;
BUGG, CE ;
COOK, WJ .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 204 (01) :191-204
[4]   ALSCRIPT - A TOOL TO FORMAT MULTIPLE SEQUENCE ALIGNMENTS [J].
BARTON, GJ .
PROTEIN ENGINEERING, 1993, 6 (01) :37-40
[5]   EVALUATION AND IMPROVEMENTS IN THE AUTOMATIC ALIGNMENT OF PROTEIN SEQUENCES [J].
BARTON, GJ ;
STERNBERG, MJE .
PROTEIN ENGINEERING, 1987, 1 (02) :89-94
[6]  
BARTON GJ, 1990, METHOD ENZYMOL, V183, P403
[7]   AMINO-ACID-SEQUENCE ANALYSIS OF THE ANNEXIN SUPERGENE FAMILY OF PROTEINS [J].
BARTON, GJ ;
NEWMAN, RH ;
FREEMONT, PS ;
CRUMPTON, MJ .
EUROPEAN JOURNAL OF BIOCHEMISTRY, 1991, 198 (03) :749-760
[8]   A STRATEGY FOR THE RAPID MULTIPLE ALIGNMENT OF PROTEIN SEQUENCES - CONFIDENCE LEVELS FROM TERTIARY STRUCTURE COMPARISONS [J].
BARTON, GJ ;
STERNBERG, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (02) :327-337
[9]   PATTERNS OF DIVERGENCE IN HOMOLOGOUS PROTEINS AS INDICATORS OF SECONDARY AND TERTIARY STRUCTURE - A PREDICTION OF THE STRUCTURE OF THE CATALYTIC DOMAIN OF PROTEIN-KINASES [J].
BENNER, SA ;
GERLOFF, D .
ADVANCES IN ENZYME REGULATION, 1991, 31 :121-181
[10]   SECONDARY STRUCTURE PREDICTION - COMBINATION OF 3 DIFFERENT METHODS [J].
BIOU, V ;
GIBRAT, JF ;
LEVIN, JM ;
ROBSON, B ;
GARNIER, J .
PROTEIN ENGINEERING, 1988, 2 (03) :185-191