Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding

被引:3
作者
Hadfield, Thomas E. [1 ]
Scantlebury, Jack [1 ]
Deane, Charlotte M. [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford Prot Informat Grp, Oxford, England
关键词
Structure-based virtual screening; Machine learning; Interpretability; PROTEIN; DOCKING;
D O I
10.1186/s13321-023-00755-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS.
引用
收藏
页数:15
相关论文
共 50 条
[31]   A machine learning-based classification model to identify the effectiveness of vibration for μEDM [J].
Mollik, Md Shohag ;
Saleh, Tanveer ;
Nor, Khairul Affendy Bin Md ;
Ali, Mohamed Sultan Mohamed .
ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (09) :6979-6989
[32]   Construction of a machine learning-based screening model for IgD myeloma [J].
Zhou, Manli ;
Feng, Sisi .
CLINICA CHIMICA ACTA, 2025, 577
[33]   Retail store location screening: A machine learning-based approach [J].
Lu, Jialiang ;
Zheng, Xu ;
Nervino, Esterina ;
Li, Yanzhi ;
Xu, Zhihua ;
Xu, Yabo .
JOURNAL OF RETAILING AND CONSUMER SERVICES, 2024, 77
[34]   Machine Learning-Based Toxicological Modeling for Screening Environmental Obesogens [J].
Wu, Siying ;
Wang, Linping ;
Schlenk, Daniel ;
Liu, Jing .
ENVIRONMENTAL SCIENCE & TECHNOLOGY, 2024, 58 (41) :18133-18144
[35]   Machine-learning scoring functions for structure-based virtual screening [J].
Li Hongjian ;
Sze, Kam-Heung ;
Lu Gang ;
Ballester, Pedro J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
[36]   Machine learning accelerates pharmacophore-based virtual screening of MAO inhibitors [J].
Cieslak, Marcin ;
Danel, Tomasz ;
Krzysztynska-Kuleta, Olga ;
Kalinowska-Tluscik, Justyna .
SCIENTIFIC REPORTS, 2024, 14 (01)
[37]   Credit scoring using machine learning and deep Learning-Based models [J].
Mestiri, Sami .
DATA SCIENCE IN FINANCE AND ECONOMICS, 2024, 4 (02) :236-248
[38]   Machine Learning-Based Virtual Screening and Molecular Modeling Reveal Potential Natural Inhibitors for Non-Small Cell Lung Cancer [J].
Al Shehri, Zafer Saad ;
Alshehri, Faez Falah .
CRYSTALS, 2025, 15 (05)
[39]   COX-2 Inhibitor Prediction With KNIME: A Codeless Automated Machine Learning-Based Virtual Screening Workflow [J].
Ghosh, Powsali ;
Kumar, Ashok ;
Singh, Sushil Kumar .
JOURNAL OF COMPUTATIONAL CHEMISTRY, 2025, 46 (02)
[40]   An integrated machine learning-based virtual screening strategy for biological weeding in maize field: a case study with HPPD [J].
Ajitha Antony ;
Ramanathan Karuppasamy .
Journal of Plant Diseases and Protection, 2023, 130 :1433-1449