Protein-protein interactions play fundamental roles in almost all biological processes. Determining the protein-protein binding affinity has been recognized not only as an important step but also as a challenging task for further understanding of the molecular mechanism and the modeling of the biological systems. Unlike the traditional methods like empirical scoring algorithms and molecular dynamic which are time consuming, we developed a fast and reliable machine learning method for the prediction of protein-protein binding affinity. Based on diverse protein-protein interface features calculated using commonly used available tools, 432 features were obtained to represent hydrogen bond, Van der Waals force, hydrophobic interaction, electrostatic force, interface shape and configuration and allosteric effect. Considering the limited number of the available structures and affinity-known protein complexes, in order to avoid overfitting and remove noises in the feature set, feature importance evaluation was implemented and 154 optimal features were selected, then the prediction model based on random forest (RF) was constructed. We demonstrate that the RE model yields promising results and the predictive power of our method is better than other existing methods. Using leave-one-out cross-validation, our model gives a correlation coefficient (r) of 0.708 on the whole benchmark dataset of 133 complexes and a high r of 0.806 on the validated set of 53 samples. When performing the same two independent datasets, our method outperforms other two methods and achieves a high r of 0.793 and 0.907 respectively. All results indicate that our method can be a useful implement in determining protein-protein binding affinity. (C) 2014 Elsevier B.V. All rights reserved.
机构:
Univ Utrecht, Bijvoet Ctr Biomol Res, Fac Sci, NL-3584 CH Utrecht, NetherlandsUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Kastritis, Panagiotis L.
Moal, Iain H.
论文数: 0引用数: 0
h-index: 0
机构:
Canc Res UK London Res Inst, Biomol Modelling Lab, Lincolns Inn Fields Labs, London WC2A 3LY, EnglandUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Moal, Iain H.
Hwang, Howook
论文数: 0引用数: 0
h-index: 0
机构:
Univ Massachusetts, Sch Med, Program Bioinformat & Integrat Biol, Worcester, MA 01605 USAUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Hwang, Howook
Weng, Zhiping
论文数: 0引用数: 0
h-index: 0
机构:
Univ Massachusetts, Sch Med, Program Bioinformat & Integrat Biol, Worcester, MA 01605 USAUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Weng, Zhiping
Bates, Paul A.
论文数: 0引用数: 0
h-index: 0
机构:
Canc Res UK London Res Inst, Biomol Modelling Lab, Lincolns Inn Fields Labs, London WC2A 3LY, EnglandUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Bates, Paul A.
Bonvin, Alexandre M. J. J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Utrecht, Bijvoet Ctr Biomol Res, Fac Sci, NL-3584 CH Utrecht, NetherlandsUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
Bonvin, Alexandre M. J. J.
Janin, Joel
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris 11, IBBMC UMR 8619, F-91405 Orsay, FranceUniv Paris 11, IBBMC UMR 8619, F-91405 Orsay, France
机构:
Univ So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USAUniv So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USA
Deng, MH
Zhang, K
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USAUniv So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USA
Zhang, K
Mehta, S
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USAUniv So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USA
Mehta, S
Chen, T
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USAUniv So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USA
Chen, T
Sun, FZ
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USAUniv So Calif, Dept Biol Sci, Program Mol & Computat Biol, Los Angeles, CA 90089 USA