A practical guide to machine-learning scoring for structure-based virtual screening

被引：0

作者：

Viet-Khoa Tran-Nguyen

Muhammad Junaid

Saw Simeon

Pedro J. Ballester

机构：

[1] Centre de Recherche en Cancérologie de Marseille,Department of Bioengineering

[2] Imperial College London,undefined

来源：

Nature Protocols | 2023年 / 18卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol, can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.

引用

页码：3460 / 3511

页数：51

共 50 条

[21] PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments
Kumar, Sivakumar Prasanth
Dixit, Nandan Y.
Patel, Chirag N.
Rawal, Rakesh M.
Pandya, Himanshu A.
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2022, 43 (12) : 847 - 863
[22] Improving structure-based virtual screening performance via learning from scoring function components
Xiong, Guo-Li
Ye, Wen-Ling
Shen, Chao
Lu, Ai-Ping
Hou, Ting-Jun
Cao, Dong-Sheng
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
[23] Improving structure-based virtual screening by multivariate analysis of scoring data
Jacobsson, M
Lidén, P
Stjernschantz, E
Boström, H
Norinder, U
JOURNAL OF MEDICINAL CHEMISTRY, 2003, 46 (26) : 5781 - 5789
[24] Machine learning classification can reduce false positives in structure-based virtual screening
Adeshina, Yusuf O.
Deeds, Eric J.
Karanicolas, John
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (31) : 18477 - 18488
[25] Evaluation of machine-learning methods for ligand-based virtual screening
Beining Chen
Robert F. Harrison
George Papadatos
Peter Willett
David J. Wood
Xiao Qing Lewell
Paulette Greenidge
Nikolaus Stiefl
Journal of Computer-Aided Molecular Design, 2007, 21 : 53 - 62
[26] Supervised scoring models with docked ligand conformations for structure-based virtual screening
Teramoto, Reiji
Fukunishi, Hiroaki
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (05) : 1858 - 1867
[27] Evaluation of machine-learning methods for ligand-based virtual screening
Chen, Beining
Harrison, Robert F.
Papadatos, George
Willett, Peter
Wood, David J.
Lewell, Xiao Qing
Greenidge, Paulette
Stiefl, Nikolaus
JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (1-3) : 53 - 62
[28] Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors
Caba, Klaudia
Tran-Nguyen, Viet-Khoa
Rahman, Taufiq
Ballester, Pedro J.
JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
[29] In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
Sieg, Jochen
Flachsenberg, Florian
Rarey, Matthias
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 947 - 961
[30] Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors
Klaudia Caba
Viet-Khoa Tran-Nguyen
Taufiq Rahman
Pedro J. Ballester
Journal of Cheminformatics, 16

← 1 2 3 4 5 →