BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening

被引：3

作者：

Brocidiacono, Michael ^{[1
]}

Francoeur, Paul ^{[2
]}

Aggarwal, Rishal ^{[2
]}

Popov, Konstantin I. ^{[1
]}

Koes, David Ryan ^{[2
]}

Tropsha, Alexander ^{[1
]}

机构：

[1] Univ N Carolina, Eshelman Sch Pharm, Chapel Hill, NC 27599 USA

[2] Univ Pittsburgh, Dept Computat & Syst Biol, Pittsburgh, PA 15260 USA

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2023年 / 64卷 / 07期

关键词：

DOCKING;

D O I：

10.1021/acs.jcim.3c01211

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 mu M cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.

引用

页码：2488 / 2495

页数：8

共 50 条

[1] Improved method of structure-based virtual screening based on ensemble learning
Li, Jin
Liu, WeiChao
Song, Yongping
Xia, JiYi
RSC ADVANCES, 2020, 10 (13) : 7609 - 7618
[2] Structure-Based Pharmacophores for Virtual Screening
Loewer, Martin
Proschak, Ewgenij
MOLECULAR INFORMATICS, 2011, 30 (05) : 398 - 404
[3] Structure-based virtual screening: an overview
Lyne, PD
DRUG DISCOVERY TODAY, 2002, 7 (20) : 1047 - 1055
[4] In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
Sieg, Jochen
Flachsenberg, Florian
Rarey, Matthias
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 947 - 961
[5] Structure-based virtual ligand screening
Villoutreix, Bruno O.
CURRENT PROTEIN & PEPTIDE SCIENCE, 2006, 7 (05) : 367 - 367
[6] Interaction prediction in structure-based virtual screening using deep learning
Gonczarek, Adam
Tomczak, Jakub M.
Zareba, Szymon
Kaczmar, Joanna
Dabrowski, Piotr
Walczak, Michal J.
COMPUTERS IN BIOLOGY AND MEDICINE, 2018, 100 : 253 - 258
[7] Machine-learning scoring functions for structure-based virtual screening
Li Hongjian
Sze, Kam-Heung
Lu Gang
Ballester, Pedro J.
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2021, 11 (01)
[8] Traditional and machine learning approaches in structure-based drug virtual screening
Zhang, Hong
Gao, Yi Qin
CHINESE JOURNAL OF CHEMICAL PHYSICS, 2024, 37 (02) : 177 - 191
[9] Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning
Ricci-Lopez, Joel
Aguila, Sergio A.
Gilson, Michael K.
Brizuela, Carlos A.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (11) : 5362 - 5376
[10] DLAB: deep learning methods for structure-based virtual screening of antibodies
Schneider, Constantin
Buchanan, Andrew
Taddese, Bruck
Deane, Charlotte M.
BIOINFORMATICS, 2022, 38 (02) : 377 - 383

← 1 2 3 4 5 →