A dynamic Bayesian network approach to protein secondary structure prediction

被引:46
作者
Yao, Xin-Qiu [1 ,2 ,3 ]
Zhu, Huaiqiu [1 ,2 ,3 ]
She, Zhen-Su [1 ,2 ,3 ,4 ]
机构
[1] Peking Univ, State Key Lab Turbulence & Complex Syst, Beijing 100871, Peoples R China
[2] Peking Univ, Dept Biomed Engn, Beijing 100871, Peoples R China
[3] Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China
[4] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90095 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1186/1471-2105-9-49
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). Results: In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q(3) accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus. Conclusion: The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.
引用
收藏
页数:13
相关论文
共 38 条
[1]   Combining prediction of secondary structure and solvent accessibility in proteins [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 59 (03) :467-475
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[4]   Protein secondary structure prediction for a single-sequence using hidden semi-Markov models [J].
Aydin, Zafer ;
Altunbasak, Yucel ;
Borodovsky, Mark .
BMC BIOINFORMATICS, 2006, 7 (1)
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction [J].
Chu, W ;
Ghahramani, Z ;
Podtelezhnikov, A ;
Wild, DL .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (02) :98-113
[7]   Protein secondary structure: entropy, correlations and prediction [J].
Crooks, GE ;
Brenner, SE .
BIOINFORMATICS, 2004, 20 (10) :1603-1611
[8]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[9]  
2-4
[10]  
Cuff JA, 2000, PROTEINS, V40, P502, DOI 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO