BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening

被引:3
|
作者
Brocidiacono, Michael [1 ]
Francoeur, Paul [2 ]
Aggarwal, Rishal [2 ]
Popov, Konstantin I. [1 ]
Koes, David Ryan [2 ]
Tropsha, Alexander [1 ]
机构
[1] Univ N Carolina, Eshelman Sch Pharm, Chapel Hill, NC 27599 USA
[2] Univ Pittsburgh, Dept Computat & Syst Biol, Pittsburgh, PA 15260 USA
关键词
DOCKING;
D O I
10.1021/acs.jcim.3c01211
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 mu M cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.
引用
收藏
页码:2488 / 2495
页数:8
相关论文
共 50 条
  • [1] Improved method of structure-based virtual screening based on ensemble learning
    Li, Jin
    Liu, WeiChao
    Song, Yongping
    Xia, JiYi
    RSC ADVANCES, 2020, 10 (13) : 7609 - 7618
  • [2] Structure-Based Pharmacophores for Virtual Screening
    Loewer, Martin
    Proschak, Ewgenij
    MOLECULAR INFORMATICS, 2011, 30 (05) : 398 - 404
  • [3] Structure-based virtual screening: an overview
    Lyne, PD
    DRUG DISCOVERY TODAY, 2002, 7 (20) : 1047 - 1055
  • [4] In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
    Sieg, Jochen
    Flachsenberg, Florian
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 947 - 961
  • [5] Structure-based virtual ligand screening
    Villoutreix, Bruno O.
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2006, 7 (05) : 367 - 367
  • [6] Interaction prediction in structure-based virtual screening using deep learning
    Gonczarek, Adam
    Tomczak, Jakub M.
    Zareba, Szymon
    Kaczmar, Joanna
    Dabrowski, Piotr
    Walczak, Michal J.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 100 : 253 - 258
  • [7] Machine-learning scoring functions for structure-based virtual screening
    Li Hongjian
    Sze, Kam-Heung
    Lu Gang
    Ballester, Pedro J.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
  • [8] Traditional and machine learning approaches in structure-based drug virtual screening
    Zhang, Hong
    Gao, Yi Qin
    CHINESE JOURNAL OF CHEMICAL PHYSICS, 2024, 37 (02) : 177 - 191
  • [9] Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning
    Ricci-Lopez, Joel
    Aguila, Sergio A.
    Gilson, Michael K.
    Brizuela, Carlos A.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (11) : 5362 - 5376
  • [10] DLAB: deep learning methods for structure-based virtual screening of antibodies
    Schneider, Constantin
    Buchanan, Andrew
    Taddese, Bruck
    Deane, Charlotte M.
    BIOINFORMATICS, 2022, 38 (02) : 377 - 383