Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

被引:3
|
作者
Cao, Zhonglin [1 ]
Sciabola, Simone [1 ]
Wang, Ye [1 ]
机构
[1] Biogen, Med Chem, Cambridge, MA 02142 USA
关键词
MOLECULAR DOCKING; INHIBITOR; DISCOVERY; BINDING; GENERATION; DATABASE; ZINC;
D O I
10.1021/acs.jcim.3c01938
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
引用
收藏
页码:1882 / 1891
页数:10
相关论文
共 50 条
  • [41] Comparative Analysis of Machine Learning Methods in Ligand-Based Virtual Screening of Large Compound Libraries
    Ma, Xiao H.
    Jia, Jia
    Zhu, Feng
    Xue, Ying
    Li, Ze R.
    Chen, Yu Z.
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2009, 12 (04) : 344 - 357
  • [42] Machine Learning-Based Virtual Screening Strategy RevealsSome Natu-ral Compounds as Potential PAK4 Inhibitors in Triple Negative Breast Cancer
    Iwaloye, Opeyemi
    Elekofehinti, Olusola Olalekan
    Kikiowo, Babatomiwa
    Oluwarotimi, Emmanuel Ayo
    Fadipe, Toyin Mary
    CURRENT PROTEOMICS, 2021, 18 (05) : 753 - 769
  • [43] Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes
    Draelos, Rachel Lea
    Dov, David
    Mazurowski, Maciej A.
    Lo, Joseph Y.
    Henao, Ricardo
    Rubin, Geoffrey D.
    Carin, Lawrence
    MEDICAL IMAGE ANALYSIS, 2021, 67
  • [44] Machine Learning-Boosted Docking Enables the Efficient Structure-Based Virtual Screening of Giga-Scale Enumerated Chemical Libraries
    Sivula, Toni
    Yetukuri, Laxman
    Kalliokoski, Tuomo
    Kasnanen, Heikki
    Poso, Antti
    Pohner, Ina
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (18) : 5773 - 5783
  • [45] Machine learning-based virtual screening and molecular modelling studies for identification of butyrylcholinesterase inhibitors as anti-Alzheimer's agent
    Ganeshpurkar, Ankit
    Akotkar, Likhit
    Kumar, Devendra
    Kumar, Dileep
    Ganeshpurkar, Aditya
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024,
  • [46] Identification of Potential JNK3 Inhibitors: A Combined Approach Using Molecular Docking and Deep Learning-Based Virtual Screening
    Yao, Chenpeng
    Shen, Zheyuan
    Shen, Liteng
    Kadier, Kailibinuer
    Zhao, Jingyi
    Guo, Yu
    Xu, Lei
    Cao, Ji
    Dong, Xiaowu
    Yang, Bo
    PHARMACEUTICALS, 2023, 16 (10)
  • [47] Machine learning-based predictive modeling, virtual screening and biological evaluation studies for identification of potential inhibitors of MMP-13
    Parwez, Shahid
    Panigrahi, Lalita
    Ahmed, Shakil
    Siddiqi, Mohammad Imran
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2023, 41 (15): : 7190 - 7203
  • [48] Virtual screening of cucurbituril host-guest complexes: Large-scale benchmark of end-point protocols under MM and QM Hamiltonians
    Wang, Xiaohui
    Yang, Huaiyu
    Wang, Mao
    Huai, Zhe
    Sun, Zhaoxi
    JOURNAL OF MOLECULAR LIQUIDS, 2024, 407
  • [49] Computational Simulation of Virtual Patients Reduces Dataset Bias and Improves Machine Learning-Based Detection of ARDS from Noisy Heterogeneous ICU Datasets
    Sharafutdinov, Konstantin
    Fritsch, Sebastian Johannes
    Iravani, Mina
    Ghalati, Pejman Farhadi
    Saffaran, Sina
    Bates, Declan G.
    Hardman, Jonathan G.
    Polzin, Richard
    Mayer, Hannah
    Marx, Gernot
    Bickenbach, Johannes
    Schuppert, Andreas
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2024, 5 : 611 - 620
  • [50] Efficiency and sustainability analysis of biogas and electricity production from a large-scale biogas project in China: an emergy evaluation based on LCA
    Wang, Xiaolong
    Chen, Yuanquan
    Sui, Peng
    Gao, Wangsheng
    Qin, Feng
    Wu, Xia
    Xiong, Jing
    JOURNAL OF CLEANER PRODUCTION, 2014, 65 : 234 - 245