Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening

被引:3
|
作者
Cao, Zhonglin [1 ]
Sciabola, Simone [1 ]
Wang, Ye [1 ]
机构
[1] Biogen, Med Chem, Cambridge, MA 02142 USA
关键词
MOLECULAR DOCKING; INHIBITOR; DISCOVERY; BINDING; GENERATION; DATABASE; ZINC;
D O I
10.1021/acs.jcim.3c01938
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, active learning and Bayesian optimization have recently been proven as effective methods of narrowing down the search space. An essential component of those methods is a surrogate machine learning model that predicts the desired properties of compounds. An accurate model can achieve high sample efficiency by finding hits with only a fraction of the entire library being virtually screened. In this study, we examined the performance of a pretrained transformer-based language model and graph neural network in a Bayesian optimization active learning framework. The best pretrained model identifies 58.97% of the top-50,000 compounds after screening only 0.6% of an ultralarge library containing 99.5 million compounds, improving 8% over the previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Pretrained models can serve as a boost to the accuracy and sample efficiency of active learning-based virtual screening.
引用
收藏
页码:1882 / 1891
页数:10
相关论文
共 50 条
  • [21] Machine Learning-Based Virtual Screening of Antibacterial Agents against Methicillin-Susceptible and Resistant Staphylococcus aureus
    Fernandes, Philipe Oliveira
    Dias, Anna LeticiaTeotonio
    dos Santos Junior, Valtair Severino
    Serafim, Mateus Sa Magalhaes
    Sousa, Yamara Viana
    Monteiro, Gustavo Claro
    Coutinho, Isabel Duarte
    Valli, Marilia
    Verzola, Marina Mol Sena Andrade
    Ottoni, Flaviano Melo
    de Padua, Rodrigo Maia
    Oda, Fernando Bombarda
    dos Santos, Andre Gonzaga
    Andricopulo, Adriano Defini
    Bolzani, Vanderlan da Silva
    Mota, Bruno Eduardo Fernandes
    Alves, Ricardo Jose
    de Oliveira, Renata Barbosa
    Kronenberger, Thales
    Maltarollo, Vinicius Goncalves
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (06) : 1932 - 1944
  • [22] Machine Learning-Based Virtual Screening and Identification of the Fourth-Generation EGFR Inhibitors
    Chang, Hao
    Zhang, Zeyu
    Tian, Jiaxin
    Bai, Tian
    Xiao, Zijie
    Wang, Dianpeng
    Qiao, Renzhong
    Li, Chao
    ACS OMEGA, 2024, 9 (02): : 2314 - 2324
  • [23] Accelerating high-throughput virtual screening through molecular pool-based active learning
    Graff, David E.
    Shakhnovich, Eugene I.
    Coley, Connor W.
    CHEMICAL SCIENCE, 2021, 12 (22) : 7866 - 7881
  • [24] Reinforcement learning-based resilient power maximization and regulation control for large-scale wind turbines under cyber actuator attacks
    Palanimuthu, Kumarasamy
    Lee, Sung Chang
    Jung, Seok-Won
    Jung, Sang Yong
    Lee, Seong Ryong
    Jeong, Jae Hoon
    Joo, Young Hoon
    SUSTAINABLE ENERGY GRIDS & NETWORKS, 2023, 36
  • [25] Reducing false positive rate of docking-based virtual screening by active learning
    Wang, Lei
    Shi, Shao-Hua
    Li, Hui
    Zeng, Xiang-Xiang
    Liu, Su-You
    Liu, Zhao-Qian
    Deng, Ya-Feng
    Lu, Ai-Ping
    Hou, Ting-Jun
    Cao, Dong-Sheng
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [26] Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments
    Sastry, Madhavi
    Lowrie, Jeffrey F.
    Dixon, Steven L.
    Sherman, Woody
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (05) : 771 - 784
  • [27] SVM ensemble based transfer learning for large-scale membrane proteins discrimination
    Mei, Suyu
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 340 : 105 - 110
  • [28] PharmacoNet: deep learning-guided pharmacophore modeling for ultra-large-scale virtual screening
    Seo, Seonghwan
    Kim, Woo Youn
    CHEMICAL SCIENCE, 2024, 15 (46) : 19473 - 19487
  • [29] Discovery of Multitarget Inhibitors against Insect Chitinolytic Enzymes via Machine Learning-Based Virtual Screening
    Ding, Yi
    Chen, Sizhe
    Liu, Huan
    Liu, Tian
    Yang, Qing
    JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2023, 71 (23) : 8769 - 8777
  • [30] Uncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data
    Vahab, Naima
    Bonu, Tarun
    Kuhlmann, Levin
    Ramialison, Mirana
    Tyagi, Sonika
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171