Alignments grow, secondary structure prediction improves

被引:141
作者
Przybylski, D [1 ]
Rost, B [1 ]
机构
[1] Columbia Univ, Dept Biochem & Mol Biophys, New York, NY 10027 USA
关键词
protein structure prediction; solvent accessibility; evolutionary information; profiles-based multiple alignments; dynamic programming; neural networks; PSI-BLAST;
D O I
10.1002/prot.10029
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth. Proteins 2002;46:197-205. (C) 2001 Wiley-Liss, Inc.
引用
收藏
页码:197 / 205
页数:9
相关论文
共 66 条
  • [1] Do aligned sequences share the same fold?
    Abagyan, RA
    Batalov, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) : 355 - 368
  • [2] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [5] Exploiting the past and the future in protein secondary structure prediction
    Baldi, P
    Brunak, S
    Frasconi, P
    Soda, G
    Pollastri, G
    [J]. BIOINFORMATICS, 1999, 15 (11) : 937 - 946
  • [6] BONA-FIDE PREDICTION OF ASPECTS OF PROTEIN CONFORMATION - ASSIGNING INTERIOR AND SURFACE RESIDUES FROM PATTERNS OF VARIATION AND CONSERVATION IN HOMOLOGOUS PROTEIN SEQUENCES
    BENNER, SA
    BADCOE, I
    COHEN, MA
    GERLOFF, DL
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (03) : 926 - 958
  • [7] PATTERNS OF DIVERGENCE IN HOMOLOGOUS PROTEINS AS INDICATORS OF SECONDARY AND TERTIARY STRUCTURE - A PREDICTION OF THE STRUCTURE OF THE CATALYTIC DOMAIN OF PROTEIN-KINASES
    BENNER, SA
    GERLOFF, D
    [J]. ADVANCES IN ENZYME REGULATION, 1991, 31 : 121 - 181
  • [8] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [9] THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS
    CHOTHIA, C
    LESK, AM
    [J]. EMBO JOURNAL, 1986, 5 (04) : 823 - 826
  • [10] PREDICTION OF SECONDARY STRUCTURE BY EVOLUTIONARY COMPARISON - APPLICATION TO THE ALPHA-SUBUNIT OF TRYPTOPHAN SYNTHASE
    CRAWFORD, IP
    NIERMANN, T
    KIRSCHNER, K
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1987, 2 (02): : 118 - 129