library of fragments;
sequence-structure relationship;
local structure prediction;
structural candidates;
ab initio;
D O I:
10.1002/prot.20815
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 angstrom C alpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype < 2.5 angstrom from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
Camproux, AC
;
Gautier, R
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
Gautier, R
;
Tufféry, P
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
Camproux, AC
;
Gautier, R
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France
Gautier, R
;
Tufféry, P
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, FranceUniv Paris 07, INSERM, E0436, Equipe Bioinformat Genom & Mol, FR-75251 Paris, France