Improving protein secondary structure prediction based on short subsequences with local structure similarity

被引:22
|
作者
Lin, Hsin-Nan [1 ,2 ,3 ]
Sung, Ting-Yi [1 ]
Ho, Shinn-Ying [3 ]
Hsu, Wen-Lian [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Bioinformat Lab, Taipei, Taiwan
[2] Acad Sinica, Taiwan Int Grad Program, Bioinformat Program, Taipei 115, Taiwan
[3] Natl Chiao Tung Univ, Inst Bioinformat, Hsinchu, Taiwan
来源
BMC GENOMICS | 2010年 / 11卷
关键词
ACCURACY;
D O I
10.1186/1471-2164-11-S4-S4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult. Results: In this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an n-gram pattern of amino acids that reflects the sequence variation in a protein's evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction. On a large non-redundant dataset of 8,297 protein chains (DsspNr-25), the average Q(3) of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (EVA Set_1 and EVA_Set2), the average Q(3) of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases. Conclusions: Our experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at http://bio-cluster.iis.sinica.edu.tw/SymPred/.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Improving protein secondary structure prediction based on short subsequences with local structure similarity
    Hsin-Nan Lin
    Ting-Yi Sung
    Shinn-Ying Ho
    Wen-Lian Hsu
    BMC Genomics, 11
  • [2] A new hybrid coding for protein secondary structure prediction based on primary structure similarity
    Li, Zhong
    Wang, Jing
    Zhang, Shunpu
    Zhang, Qifeng
    Wu, Wuming
    GENE, 2017, 618 : 8 - 13
  • [3] Sequence/structure similarity and support vector machine for protein secondary structure prediction
    Lin, JH
    Tsai, CL
    Lin, MR
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIII, PROCEEDINGS: INDUSTRIAL SYSTEMS, 2004, : 71 - 76
  • [4] Improving protein secondary structure predictions by prediction fusion
    Palopoli, Luigi
    Rombo, Simona E.
    Terracina, Giorgio
    Tradigo, Giuseppe
    Veltri, Pierangelo
    INFORMATION FUSION, 2009, 10 (03) : 217 - 232
  • [5] Protein secondary structure prediction using local alignments
    Salamov, AA
    Solovyev, VV
    JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 31 - 36
  • [6] Improving protein secondary-structure prediction by predicting ends of secondary-structure segments
    Midic, U
    Dunker, AK
    Obradovic, Z
    Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2005, : 490 - 497
  • [7] Cluster-based local modeling approach to protein secondary structure prediction
    Doong, SH
    Yeh, CY
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2005, 2 (04) : 551 - 560
  • [8] Improving protein secondary structure prediction with aligned homologous sequences
    DiFrancesco, V
    Garnier, J
    Munson, PJ
    PROTEIN SCIENCE, 1996, 5 (01) : 106 - 113
  • [9] Analysis and prediction of protein local structure based on structure alphabets
    Dong, Qiwen
    Wang, Xiaolong
    Lin, Lei
    Wang, Yadong
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 72 (01) : 163 - 172
  • [10] Function prediction from networks of local evolutionary similarity in protein structure
    Serkan Erdin
    Eric Venner
    Andreas Martin Lisewski
    Olivier Lichtarge
    BMC Bioinformatics, 14