Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening

被引:208
|
作者
Ain, Qurrat Ul [1 ]
Aleksandrova, Antoniya [2 ]
Roessler, Florian D. [1 ]
Ballester, Pedro J. [3 ]
机构
[1] Univ Cambridge, Dept Chem, Ctr Mol Informat, Cambridge CB2 1EW, England
[2] Univ Cambridge, Cavendish Lab, Cambridge CB3 0HE, England
[3] Aix Marseille Univ, Canc Res Ctr Marseille, INSERM,Inst Paoli Calmettes, CNRS UMR7258,U1068, Marseille, France
基金
英国医学研究理事会;
关键词
PROTEIN-LIGAND-BINDING; ULTRAFAST SHAPE-RECOGNITION; OUT CROSS-VALIDATION; RANDOM FOREST; MOLECULAR DOCKING; DRUG DISCOVERY; DATA SETS; COMPLEXES; DESCRIPTORS; LEAD;
D O I
10.1002/wcms.1225
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the .
引用
收藏
页码:405 / 424
页数:20
相关论文
共 50 条
  • [1] Machine-learning scoring functions for structure-based virtual screening
    Li Hongjian
    Sze, Kam-Heung
    Lu Gang
    Ballester, Pedro J.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
  • [2] Performance of machine-learning scoring functions in structure-based virtual screening
    Wojcikowski, Maciej
    Ballester, Pedro J.
    Siedlecki, Pawel
    SCIENTIFIC REPORTS, 2017, 7
  • [3] Performance of machine-learning scoring functions in structure-based virtual screening
    Maciej Wójcikowski
    Pedro J. Ballester
    Pawel Siedlecki
    Scientific Reports, 7
  • [4] Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening
    Zhu, Hui
    Yang, Jincai
    Huang, Niu
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (22) : 5485 - 5502
  • [5] A practical guide to machine-learning scoring for structure-based virtual screening
    Viet-Khoa Tran-Nguyen
    Muhammad Junaid
    Saw Simeon
    Pedro J. Ballester
    Nature Protocols, 2023, 18 : 3460 - 3511
  • [6] A practical guide to machine-learning scoring for structure-based virtual screening
    Tran-Nguyen, Viet-Khoa
    Junaid, Muhammad
    Simeon, Saw
    Ballester, Pedro J.
    NATURE PROTOCOLS, 2023, 18 (11) : 3460 - 3511
  • [7] Beware of the generic machine learning-based scoring functions in structure-based virtual screening
    Shen, Chao
    Hu, Ye
    Wang, Zhe
    Zhang, Xujun
    Pang, Jinping
    Wang, Gaoang
    Zhong, Haiyang
    Xu, Lei
    Cao, Dongsheng
    Hou, Tingjun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [8] The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction
    Li, Hongjian
    Peng, Jiangjun
    Leung, Yee
    Leung, Kwong-Sak
    Wong, Man-Hon
    Lu, Gang
    Ballester, Pedro J.
    BIOMOLECULES, 2018, 8 (01)
  • [9] Machine-learning scoring functions for structure-based drug lead optimization
    Li, Hongjian
    Sze, Kam-Heung
    Lu, Gang
    Ballester, Pedro J.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2020, 10 (05)
  • [10] Further development and validation of empirical scoring functions for structure-based binding affinity prediction
    Wang, RX
    Lai, LH
    Wang, SM
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (01) : 11 - 26