A structural alphabet for local protein structures: Improved prediction methods

被引:86
作者
Etchebest, C [1 ]
Benros, C [1 ]
Hazout, S [1 ]
de Brevern, AG [1 ]
机构
[1] Univ Paris 07, INSERM, Equipe Bioinformat Genom & Mol, U726, F-75251 Paris, France
关键词
structure-sequence relationship; probabilistic approach; Bayes' rule; secondary structure; protein blocks; ab initio;
D O I
10.1002/prot.20458
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q(16) prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 angstrom on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q(16) = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q(16) by only 1%. An entropy index (N-eq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The N-eq is linearly correlated with the Q(16) prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate Q(16)(E) is deduced with a mean error of 5%. (c) 2005 Wiley-Liss, Inc.
引用
收藏
页码:810 / 827
页数:18
相关论文
共 73 条
[1]   Helix capping [J].
Aurora, R ;
Rose, GD .
PROTEIN SCIENCE, 1998, 7 (01) :21-38
[2]   HELANAL: A program to characterize helix geometry in proteins [J].
Bansal, M ;
Kumar, S ;
Velavan, R .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2000, 17 (05) :811-819
[3]  
BENROS C, 2003, IEEE INT WORK NNSP, V1, P53
[4]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[5]   De novo prediction of three-dimensional structures for major protein families [J].
Bonneau, R ;
Strauss, CEM ;
Rohl, CA ;
Chivian, D ;
Bradley, P ;
Malmström, L ;
Robertson, T ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 322 (01) :65-78
[6]  
Bonneau R, 2001, PROTEINS, P119
[7]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[8]   Prediction of local structure in proteins using a library of sequence-structure motifs [J].
Bystroff, C ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) :565-577
[9]   HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins [J].
Bystroff, C ;
Thorsson, V ;
Baker, D .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) :173-190
[10]  
Bystroff Christopher, 2002, Bioinformatics, V18 Suppl 1, pS54