Adding Stochastic Negative Examples into Machine Learning Improves Molecular Bioactivity Prediction

被引：16

作者：

Caceres, Elena L. ^{[1
]}

Mew, Nicholas C. ^{[1
]}

Keiser, Michael J. ^{[1
]}

机构：

[1] Univ Calif San Francisco, Kavli Inst Fundamental Neurosci, Bakar Computat Hlth Sci Inst, Inst Neurodegenerat Dis,Dept Pharmaceut Chem,Dept, San Francisco, CA 94143 USA

来源：

JOURNAL OF CHEMICAL INFORMATION AND MODELING | 2020年 / 60卷 / 12期

基金：

美国国家科学基金会;

关键词：

NEURAL-NETWORKS; VALIDATION;

D O I：

10.1021/acs.jcim.0c00565

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R-2 = 0.1926 +/- 0.0186 to 0.4269 +/- 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.

引用

页码：5957 / 5970

页数：14

共 55 条

[1] A Simple Representation of Three-Dimensional Molecular Structure
Axen, Seth D.
Huang, Xi-Ping
Caceres, Elena L.
Gendelev, Leo
Roth, Bryan L.
Keiser, Michael J.
[J]. JOURNAL OF MEDICINAL CHEMISTRY, 2017, 60 (17) : 7393 - 7409
[2] The ChEMBL bioactivity database: an update
Bento, A. Patricia
Gaulton, Anna
Hersey, Anne
Bellis, Louisa J.
Chambers, Jon
Davies, Mark
Krueger, Felix A.
Light, Yvonne
Mak, Lora
McGlinchey, Shaun
Nowotka, Michal
Papadatos, George
Santos, Rita
Overington, John P.
[J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D1083 - D1090
[3] Dealing with a data dilemma
Bradley, David
[J]. NATURE REVIEWS DRUG DISCOVERY, 2008, 7 (08) : 632 - 633
[4] A systematic study of the class imbalance problem in convolutional neural networks
Buda, Mateusz
Maki, Atsuto
Mazurowski, Maciej A.
[J]. NEURAL NETWORKS, 2018, 106 : 249 - 259
[5] Deep learning approaches in predicting ADMET properties
Caceres, Elena L.
Tudor, Matthew
Cheng, Alan C.
[J]. FUTURE MEDICINAL CHEMISTRY, 2020, 12 (22) : 1995 - 1999
[6] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[7] Adversarial Controls for Scientific Machine Learning
Chuang, Kangway V.
Keiser, Michael J.
[J]. ACS CHEMICAL BIOLOGY, 2018, 13 (10) : 2819 - 2821
[8] Effect of missing data on multitask prediction methods
de Leon, Antonio de la Vega
Chen, Beining
Gillet, Valerie J.
[J]. JOURNAL OF CHEMINFORMATICS, 2018, 10
[9] Similarity-based machine learning methods for predicting drug-target interactions: a brief review
Ding, Hao
Takigawa, Ichigaku
Mamitsuka, Hiroshi
Zhu, Shanfeng
[J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (05) : 734 - 747
[10] PotentialNet for Molecular Property Prediction
Feinberg, Evan N.
Sur, Debnil
Wu, Zhenqin
Husic, Brooke E.
Mai, Huanghao
Li, Yang
Sun, Saisai
Yang, Jianyi
Ramsundar, Bharath
Pande, Vijay S.
[J]. ACS CENTRAL SCIENCE, 2018, 4 (11) : 1520 - 1530

← 1 2 3 4 5 6 →