Protein secondary structure prediction for a single-sequence using hidden semi-Markov models

被引:76
作者
Aydin, Zafer
Altunbasak, Yucel
Borodovsky, Mark [1 ]
机构
[1] Georgia Inst Technol, Sch Biol, Wallace H Coulter Dept Biomed Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Sch Biol, Ctr Bioinformat & Computat Biol, Atlanta, GA 30332 USA
[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
关键词
D O I
10.1186/1471-2105-7-178
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The accuracy of protein secondary structure prediction has been improving steadily towards the 88% estimated theoretical limit. There are two types of prediction algorithms: Single-sequence prediction algorithms imply that information about other ( homologous) proteins is not available, while algorithms of the second type imply that information about homologous proteins is available, and use it intensively. The single-sequence algorithms could make an important contribution to studies of proteins with no detected homologs, however the accuracy of protein secondary structure prediction from a single-sequence is not as high as when the additional evolutionary information is present. Results: In this paper, we further refine and extend the hidden semi-Markov model (HSMM) initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition. Conclusions: We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable to current similarity search methods.
引用
收藏
页数:15
相关论文
共 61 条
[21]   A novel method for protein secondary structure prediction using dual-layer SVM and profiles [J].
Guo, J ;
Chen, H ;
Sun, ZR ;
Lin, YL .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 54 (04) :738-743
[22]  
HOBOHM U, 1994, PROTEIN SCI, V3, P522
[23]   A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach [J].
Hua, SJ ;
Sun, ZR .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 308 (02) :397-407
[24]   Analysis of two large functionally uncharacterized regions in the Methanopyrus kandleri AV19 genome -: art. no. 12 [J].
Jensen, LJ ;
Skovgaard, M ;
Sicheritz-Pontén, T ;
Jorgensen, MK ;
Lundegaard, C ;
Pedersen, CC ;
Petersen, N ;
Ussery, D .
BMC GENOMICS, 2003, 4 (1)
[25]   Prediction of human protein function from post-translational modifications and localization features [J].
Jensen, LJ ;
Gupta, R ;
Blom, N ;
Devos, D ;
Tamames, J ;
Kesmir, C ;
Nielsen, H ;
Stærfeldt, HH ;
Rapacki, K ;
Workman, C ;
Andersen, CAF ;
Knudsen, S ;
Krogh, A ;
Valencia, A ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 319 (05) :1257-1265
[26]   Protein secondary structure prediction based on position-specific scoring matrices [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) :195-202
[27]   Protein secondary structure prediction based on an improved support vector machines approach [J].
Kim, H ;
Park, H .
PROTEIN ENGINEERING, 2003, 16 (08) :553-560
[28]   Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence [J].
Kloczkowski, A ;
Ting, TL ;
Jernigan, RL ;
Garnier, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 49 (02) :154-166
[29]  
Kulp D, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P134
[30]  
Kumar S, 1998, PROTEINS, V31, P460, DOI 10.1002/(SICI)1097-0134(19980601)31:4<460::AID-PROT12>3.3.CO