A dynamic Bayesian network approach to protein secondary structure prediction

被引：46

作者：

Yao, Xin-Qiu ^{[1
,2
,3
]}

Zhu, Huaiqiu ^{[1
,2
,3
]}

She, Zhen-Su ^{[1
,2
,3
,4
]}

机构：

[1] Peking Univ, State Key Lab Turbulence & Complex Syst, Beijing 100871, Peoples R China

[2] Peking Univ, Dept Biomed Engn, Beijing 100871, Peoples R China

[3] Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China

[4] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90095 USA

来源：

BMC BIOINFORMATICS | 2008年 / 9卷 / 1期

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1186/1471-2105-9-49

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). Results: In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q(3) accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus. Conclusion: The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.

引用

页数：13

共 38 条

[1] Combining prediction of secondary structure and solvent accessibility in proteins [J].