Prediction of protein binding sites in protein structures using hidden Markov support vector machine

被引:44
作者
Liu, Bin [1 ]
Wang, Xiaolong [1 ,2 ]
Lin, Lei [2 ,3 ]
Tang, Buzhou [1 ]
Dong, Qiwen [4 ]
Wang, Xuan [1 ]
机构
[1] Shenzhen Grad Sch, Harbin Inst Technol, Shenzhen, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150006, Peoples R China
[3] Harbin Inst Technol, Dept Control Sci & Engn, Harbin 150006, Peoples R China
[4] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
中国国家自然科学基金;
关键词
SEQUENCE-BASED PREDICTION; SECONDARY STRUCTURE; DISULFIDE CONNECTIVITY; PSI-BLAST; RESIDUE; EVOLUTION; PROFILE; RECOGNITION; COMPLEXES; IDENTIFY;
D O I
10.1186/1471-2105-10-381
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results: In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion: The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.
引用
收藏
页数:14
相关论文
共 58 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
Altun Y., 2003, P INT C MACHINE LEAR, P3
[5]  
[Anonymous], 2001, P 18 INT C MACH LEAR, DOI DOI 10.5555/645530.655813
[6]   Statistical analysis and prediction of protein-protein interfaces [J].
Bordner, AJ ;
Abagyan, R .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (03) :353-366
[7]   Improved prediction of protein-protein binding sites using a support vector machines approach [J].
Bradford, JR ;
Westhead, DR .
BIOINFORMATICS, 2005, 21 (08) :1487-1494
[8]   Predicting protein interaction sites: binding hot-spots in protein-protein and protein-ligand interfaces [J].
Burgoyne, Nicholas J. ;
Jackson, Richard M. .
BIOINFORMATICS, 2006, 22 (11) :1335-1342
[9]   Protease substrate site predictors derived from machine learning on multilevel substrate phage display data [J].
Chen, Ching-Tai ;
Yang, Ei-Wen ;
Hsu, Hung-Ju ;
Sun, Yi-Kun ;
Hsu, Wen-Lian ;
Yang, An-Suei .
BIOINFORMATICS, 2008, 24 (23) :2691-2697
[10]   Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data [J].
Chen, HL ;
Zhou, HX .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (01) :21-35