Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

被引:3
|
作者
Cao, Zhonglin [1 ]
Sciabola, Simone [1 ]
Wang, Ye [1 ]
机构
[1] Biogen, Med Chem, Cambridge, MA 02142 USA
关键词
MOLECULAR DOCKING; INHIBITOR; DISCOVERY; BINDING; GENERATION; DATABASE; ZINC;
D O I
10.1021/acs.jcim.3c01938
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
引用
收藏
页码:1882 / 1891
页数:10
相关论文
共 50 条
  • [1] DeepCPI: A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Wan, Fangping
    Zhu, Yue
    Hu, Hailin
    Dai, Antao
    Cai, Xiaoqing
    Chen, Ligong
    Gong, Haipeng
    Xia, Tian
    Yang, Dehua
    Wang, Ming-Wei
    Zeng, Jianyang
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2019, 17 (05) : 478 - 495
  • [2] Machine Learning-Enabled Pipeline for Large-Scale Virtual Drug Screening
    Gupta, Aayush
    Zhou, Huan-Xiang
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (09) : 4236 - 4244
  • [3] ScaffComb: A Phenotype-Based Framework for Drug Combination Virtual Screening in Large-Scale Chemical Datasets
    Ye, Zhaofeng
    Chen, Fengling
    Zeng, Jiangyang
    Gao, Juntao
    Zhang, Michael Q.
    ADVANCED SCIENCE, 2021, 8 (24)
  • [4] BEAR: A Novel Virtual Screening Method Based on Large-Scale Bioactivity Data
    Kwon, Yeajee
    Park, Sera
    Lee, Jaeok
    Kang, Jiyeon
    Lee, Hwa Jeong
    Kim, Wankyu
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (05) : 1429 - 1437
  • [5] Machine Learning-based Virtual Screening for STAT3 Anticancer Drug Target
    Wadood, Abdul
    Ajmal, Amar
    Junaid, Muhammad
    Rehman, Ashfaq Ur
    Uddin, Reaz
    Azam, Syed Sikander
    Khan, Alam Zeb
    Ali, Asad
    CURRENT PHARMACEUTICAL DESIGN, 2022, 28 (36) : 3023 - 3032
  • [6] Dockey: a modern integrated tool for large-scale molecular docking and virtual screening
    Du, Lianming
    Geng, Chaoyue
    Zeng, Qianglin
    Huang, Ting
    Tang, Jie
    Chu, Yiwen
    Zhao, Kelei
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (02)
  • [7] Deep Learning with Geometry-Enhanced Molecular Representation for Augmentation of Large-Scale Docking-Based Virtual Screening
    Yu, Lan
    He, Xiao
    Fang, Xiaomin
    Liu, Lihang
    Liu, Jinfeng
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (21) : 6501 - 6514
  • [8] Systematic Investigation of Docking Failures in Large-Scale Structure-Based Virtual Screening
    Xu, Min
    Shen, Cheng
    Yang, Jincai
    Wang, Qing
    Huang, Niu
    ACS OMEGA, 2022, 7 (43): : 39417 - 39428
  • [9] Large-scale virtual screening experiments on Windows Azure-based cloud resources
    Kiss, Tamas
    Borsody, Peter
    Terstyanszky, Gabor
    Winter, Stephen
    Greenwell, Pamela
    McEldowney, Sharron
    Heindl, Hans
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (10): : 1760 - 1770
  • [10] Discovery of novel A2AR antagonists through deep learning-based virtual screening
    Tang, Miru
    Wen, Chang
    Lin, Jie
    Chen, Hongming
    Ran, Ting
    ARTIFICIAL INTELLIGENCE IN THE LIFE SCIENCES, 2023, 3