Protein Secondary Structure Prediction with SPARROW

被引:14
作者
Bettella, Francesco [1 ,2 ]
Rasinski, Dawid [1 ]
Knapp, Ernst Walter [1 ]
机构
[1] Free Univ Berlin, Inst Chem, D-14195 Berlin, Germany
[2] DeCODE Genet, IS-101 Reykjavik, Iceland
关键词
FOLD RECOGNITION; ASTRAL COMPENDIUM; LOCAL-STRUCTURE; SEQUENCES; ACCURACY; MODELS; CLASSIFICATION; POTENTIALS; EVOLUTION; DATABASE;
D O I
10.1021/ci200321u
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A first step toward predicting the structure of a protein is to determine its secondary structure. The secondary structure information is generally used as starting point to solve protein crystal structures. In the present study, a machine learning approach based on a complete set of two-class scoring function:; was used. Such functions discriminate between two specific structural classes or between a single specific class and the rest. The approach uses a hierarchical scheme of scoring functions and a neural network. The parameters are determined by optimizing the recall of learning data. Quality control performed by predicting separate independent test data. A first set of scoring functions is trained to correlate the secondary structures of residues with profiles of sequence windows of width 15, centered at these residues. The sequence profiles are obtained by multiple sequence alignment with PSI-BLAST. A second set of scoring functions is trained to correlate the secondary structures of the center residues with the secondary structures of all other residues in the sequence windows used in the first step. Finally, a neural network is trained using the results from the second set of scoring functions as input to make a decision on the secondary structure class of the residue in the center of the sequence window, Here, we consider the three-class problem of helix, strand, and other secondary structures. The corresponding prediction scheme "SPARROW" was trained with the ASTRAL40 database, which contains protein domain structures with less than 40% sequence identity. The secondary structures were determined with DSSP. In a loose assignment, the helix class contains all DSSP helix types (alpha, 3-10, pi), the strand class contains beta-strand and beta-bridge, and the third class contain.; the other structures. In a tight assignment, the helix and strand classes contain only alpha-helix and beta-strand classes, respectively. A 10-fold cross validation showed less than 0.8% deviation in the fraction of correct structure assignments between true prediction and recall of data used for training. Using sequences of 140,000 residues as a test data set, 80.46% +/- 0.35% of secondary structures are predicted correctly in the loose assignment, a prediction performance, which is very close to the best results in the field. Most applications are done with the loose assignment. However, the tight assignment yields 2.25% better prediction performance. With each individual prediction, we also provide a confidence measure providing the probability that the prediction is correct. The SPARROW software can be used and downloaded on the Web page http://agknapp.chemie.fu-berlin.de/sparrow/.
引用
收藏
页码:545 / 556
页数:12
相关论文
共 101 条
[21]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[22]   MUPRED: A tool for bridging the gap between template based methods and sequence profile based methods for protein secondary structure prediction [J].
Bondugula, Rajkumar ;
Xu, Dong .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 66 (03) :664-670
[23]   A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE [J].
BOWIE, JU ;
LUTHY, R ;
EISENBERG, D .
SCIENCE, 1991, 253 (5016) :164-170
[24]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[25]   Protein structure prediction servers at university college london [J].
Bryson, K ;
McGuffin, LJ ;
Marsden, RL ;
Ward, JJ ;
Sodhi, JS ;
Jones, DT .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W36-W38
[26]   The ASTRAL Compendium in 2004 [J].
Chandonia, JM ;
Hon, G ;
Walker, NS ;
Lo Conte, L ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D189-D192
[27]   ASTRAL compendium enhancements [J].
Chandonia, JM ;
Walker, NS ;
Conte, LL ;
Koehl, P ;
Levitt, M ;
Brenner, SE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :260-263
[28]   Bidirectional segmented-memory recurrent neural network for protein secondary structure prediction [J].
Chen, J ;
Chaudhari, NS .
SOFT COMPUTING, 2006, 10 (04) :315-324
[29]   SCRATCH: a protein structure and structural feature prediction server [J].
Cheng, J ;
Randall, AZ ;
Sweredoski, MJ ;
Baldi, P .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W72-W76
[30]  
Cheng Jianlin, 2008, IEEE Rev Biomed Eng, V1, P41, DOI 10.1109/RBME.2008.2008239