Assessing a novel approach for predicting local 3D protein structures from sequence

被引:33
作者
Benros, C [1 ]
de Brevern, AG [1 ]
Etchebest, C [1 ]
Hazout, S [1 ]
机构
[1] Univ Paris 07, INSERM, U726, EBGM, F-75251 Paris, France
关键词
library of fragments; sequence-structure relationship; local structure prediction; structural candidates; ab initio;
D O I
10.1002/prot.20815
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 angstrom C alpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype < 2.5 angstrom from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
引用
收藏
页码:865 / 880
页数:16
相关论文
共 59 条
[1]   Predictions without templates: New folds, secondary structure, and contacts in CASP5 [J].
Aloy, P ;
Stark, A ;
Hadley, S ;
Russell, RB .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :436-456
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Helix capping [J].
Aurora, R ;
Rose, GD .
PROTEIN SCIENCE, 1998, 7 (01) :21-38
[4]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[5]  
BENROS C, 2003, IEEE INT WORK NNSP, V1, P53
[6]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]   Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation [J].
Bradley, P ;
Chivian, D ;
Meiler, J ;
Misura, KMS ;
Rohl, CA ;
Schief, WR ;
Wedemeyer, WJ ;
Schueler-Furman, O ;
Murphy, P ;
Schonbrun, J ;
Strauss, CEM ;
Baker, D .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :457-468
[8]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[9]   HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins [J].
Bystroff, C ;
Thorsson, V ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) :173-190
[10]   A hidden Markov model derived structural alphabet for proteins [J].
Camproux, AC ;
Gautier, R ;
Tufféry, P .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 339 (03) :591-605