A practical guide to machine-learning scoring for structure-based virtual screening

被引:0
|
作者
Viet-Khoa Tran-Nguyen
Muhammad Junaid
Saw Simeon
Pedro J. Ballester
机构
[1] Centre de Recherche en Cancérologie de Marseille,Department of Bioengineering
[2] Imperial College London,undefined
来源
Nature Protocols | 2023年 / 18卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.
引用
收藏
页码:3460 / 3511
页数:51
相关论文
共 50 条
  • [21] PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments
    Kumar, Sivakumar Prasanth
    Dixit, Nandan Y.
    Patel, Chirag N.
    Rawal, Rakesh M.
    Pandya, Himanshu A.
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2022, 43 (12) : 847 - 863
  • [22] Improving structure-based virtual screening performance via learning from scoring function components
    Xiong, Guo-Li
    Ye, Wen-Ling
    Shen, Chao
    Lu, Ai-Ping
    Hou, Ting-Jun
    Cao, Dong-Sheng
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [23] Improving structure-based virtual screening by multivariate analysis of scoring data
    Jacobsson, M
    Lidén, P
    Stjernschantz, E
    Boström, H
    Norinder, U
    JOURNAL OF MEDICINAL CHEMISTRY, 2003, 46 (26) : 5781 - 5789
  • [24] Machine learning classification can reduce false positives in structure-based virtual screening
    Adeshina, Yusuf O.
    Deeds, Eric J.
    Karanicolas, John
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (31) : 18477 - 18488
  • [25] Evaluation of machine-learning methods for ligand-based virtual screening
    Beining Chen
    Robert F. Harrison
    George Papadatos
    Peter Willett
    David J. Wood
    Xiao Qing Lewell
    Paulette Greenidge
    Nikolaus Stiefl
    Journal of Computer-Aided Molecular Design, 2007, 21 : 53 - 62
  • [26] Supervised scoring models with docked ligand conformations for structure-based virtual screening
    Teramoto, Reiji
    Fukunishi, Hiroaki
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (05) : 1858 - 1867
  • [27] Evaluation of machine-learning methods for ligand-based virtual screening
    Chen, Beining
    Harrison, Robert F.
    Papadatos, George
    Willett, Peter
    Wood, David J.
    Lewell, Xiao Qing
    Greenidge, Paulette
    Stiefl, Nikolaus
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (1-3) : 53 - 62
  • [28] Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors
    Caba, Klaudia
    Tran-Nguyen, Viet-Khoa
    Rahman, Taufiq
    Ballester, Pedro J.
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [29] In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
    Sieg, Jochen
    Flachsenberg, Florian
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 947 - 961
  • [30] Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors
    Klaudia Caba
    Viet-Khoa Tran-Nguyen
    Taufiq Rahman
    Pedro J. Ballester
    Journal of Cheminformatics, 16