A random forest classifier for protein-protein docking models

被引:5
作者
Barradas-Bautista, Didier [1 ]
Cao, Zhen [1 ]
Vangone, Anna [2 ]
Oliva, Romina [3 ]
Cavallo, Luigi [1 ]
Gromiha, Michael
机构
[1] King Abdullah Univ Sci & Technol KAUST, Kaust Catalysis Ctr, Phys Sci & Engn Div, Thuwal 239556900, Saudi Arabia
[2] Roche Innovat Ctr Munich Large Mol Res, Pharm Res & Early Dev, Therapeut Modal, D-82377 Penzberg, Germany
[3] Univ Parthenope Naples, Ctr Direzionale Isola C4, Dept Sci & Technol, I-80143 Naples, Italy
来源
BIOINFORMATICS ADVANCES | 2022年 / 2卷 / 01期
关键词
INTER-RESIDUE CONTACTS; PREDICTION; COMPLEXES; FEATURES; RANKING; ELECTROSTATICS; CONSERVATION; REFINEMENT; POTENTIALS; AFFINITY;
D O I
10.1093/bioadv/vbab042
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated 3x104 docking models for each of the 230 complexes in the protein-protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of approximate to 7x106 docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions.Supplementary information are available at Bioinformatics Advances online.Software and data availability statement The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.
引用
收藏
页数:9
相关论文
共 65 条
  • [1] InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution
    Andreani, Jessica
    Faure, Guilhem
    Guerois, Raphael
    [J]. BIOINFORMATICS, 2013, 29 (14) : 1742 - 1749
  • [2] FireDock: Fast interaction refinement in molecular docking
    Andrusier, Nelly
    Nussinov, Ruth
    Wolfson, Haim J.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 69 (01) : 139 - 159
  • [3] The CASP13-CAPRI targets as case studies to illustrate a novel scoring pipeline integrating CONSRANK with clustering and interface analyses
    Barradas-Bautista, Didier
    Cao, Zhen
    Cavallo, Luigi
    Oliva, Romina
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 8)
  • [4] Docking-based modeling of protein-protein interfaces for extensive structural and functional characterization of missense mutations
    Barradas-Bautista, Didier
    Fernandez-Recio, Juan
    [J]. PLOS ONE, 2017, 12 (08):
  • [5] Selection of relevant features and examples in machine learning
    Blum, AL
    Langley, P
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
  • [6] Bayesian Active Learning for Optimization and Uncertainty Quantification in Protein Docking
    Cao, Yue
    Shen, Yang
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2020, 16 (08) : 5334 - 5347
  • [7] Energy-based graph convolutional networks for scoring protein docking models
    Cao, Yue
    Shen, Yang
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2020, 88 (08) : 1091 - 1099
  • [8] ZDOCK: An initial-stage protein-docking algorithm
    Chen, R
    Li, L
    Weng, ZP
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 52 (01) : 80 - 87
  • [9] Comprehensive characterization of protein-protein interactions perturbed by disease mutations
    Cheng, Feixiong
    Zhao, Junfei
    Wang, Yang
    Lu, Weiqiang
    Liu, Zehui
    Zhou, Yadi
    Martin, William R.
    Wang, Ruisheng
    Huang, Jin
    Hao, Tong
    Yue, Hong
    Ma, Jing
    Hou, Yuan
    Castrillon, Jessica A.
    Fang, Jiansong
    Lathia, Justin D.
    Keri, Ruth A.
    Lightstone, Felice C.
    Antman, Elliott Marshall
    Rabadan, Raul
    Hill, David E.
    Eng, Charis
    Vidal, Marc
    Loscalzo, Joseph
    [J]. NATURE GENETICS, 2021, 53 (03) : 342 - +
  • [10] pyDock: Electrostatics and desolvation for effective scoring of rigid-body protein-protein docking
    Cheng, Tammy Man-Kuang
    Blundell, Tom L.
    Fernandez-Recio, Juan
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2007, 68 (02) : 503 - 515