Protein structural motif prediction in multidimensional φ-ψ space leads to improved secondary structure prediction

被引:33
作者
Mooney, Catherine [1 ]
Vullo, Alessandro [1 ]
Pollastri, Gianluca [1 ]
机构
[1] Univ Coll Dublin, Sch Comp & Informat Sci, Dublin 4, Ireland
关键词
protein structure prediction; secondary structure; structural motifs; neural networks;
D O I
10.1089/cmb.2006.13.1489
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A significant step towards establishing the structure and function of a protein is the prediction of the local conformation of the polypeptide chain. In this article, we present systems for the prediction of three new alphabets of local structural motifs. The motifs are built by applying multidimensional scaling (MDS) and clustering to pair-wise angular distances for multiple phi-psi angle values collected from high-resolution protein structures. The predictive systems, based on ensembles of bidirectional recurrent neural network architectures, and trained on a large non-redundant set of protein structures, achieve 72%, 66%, and 60% correct motif prediction on an independent test set for di-peptides (six classes), tri-peptides (eight classes) and tetra-peptides (14 classes), respectively, 28-30% above baseline statistical predictors. We then build a further system, based on ensembles of two-layered bidirectional recurrent neural networks, to map structural motif predictions into a traditional 3-class (helix, strand, coil) secondary structure. This system achieves 79.5% correct prediction using the "hard" CASP 3-class assignment, and 81.4% with a more lenient assignment, outperforming a sophisticated state-of-the-art predictor (Porter) trained in the same experimental conditions. The structural motif predictor is publicly available at: http://distill.ucd.ie/porter+/.
引用
收藏
页码:1489 / 1502
页数:14
相关论文
共 31 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Exploiting the past and the future in protein secondary structure prediction
    Baldi, P
    Brunak, S
    Frasconi, P
    Soda, G
    Pollastri, G
    [J]. BIOINFORMATICS, 1999, 15 (11) : 937 - 946
  • [3] The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem
    Baldi, P
    Pollastri, G
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (04) : 575 - 602
  • [4] Baldi P, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P25
  • [5] Prediction of local structure in proteins using a library of sequence-structure motifs
    Bystroff, C
    Baker, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) : 565 - 577
  • [6] HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins
    Bystroff, C
    Thorsson, V
    Baker, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) : 173 - 190
  • [7] WebLogo: A sequence logo generator
    Crooks, GE
    Hon, G
    Chandonia, JM
    Brenner, SE
    [J]. GENOME RESEARCH, 2004, 14 (06) : 1188 - 1190
  • [8] de Brevern AG, 2000, PROTEINS, V41, P271, DOI 10.1002/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO
  • [9] 2-Z
  • [10] DEBREVERN AG, 2004, SILICO BIOL, V4, P31