Minimal-uncertainty prediction of general drug-likeness based on Bayesian neural networks

被引:35
作者
Beker, Wiktor [1 ,2 ]
Wolos, Agnieszka [1 ,2 ]
Szymkuc, Sara [1 ,2 ]
Grzybowski, Bartosz A. [1 ,2 ,3 ,4 ]
机构
[1] Polish Acad Sci, Inst Organ Chem, Warsaw, Poland
[2] Allchemy Inc, Highland, IN USA
[3] Inst Basic Sci IBS, Ctr Soft & Living Matter, Ulsan, South Korea
[4] UNIST, Ulsan Inst Sci & Technol, Dept Chem, Ulsan, South Korea
关键词
CHEMISTRY; OPTIMIZATION;
D O I
10.1038/s42256-020-0209-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Triaging unpromising lead molecules early in the drug discovery process is essential for accelerating its pace while avoiding the costs of unwarranted biological and clinical testing. Accordingly, medicinal chemists have been trying for decades to develop metrics-ranging from heuristic measures to machine-learning models-that could rapidly distinguish potential drugs from small molecules that lack drug-like features. However, none of these metrics has gained universal acceptance and the very idea of 'drug-likeness' has recently been put into question. Here, we evaluate drug-likeness using different sets of descriptors and different state-of-the-art classifiers, reaching an out-of-sample accuracy of 87-88%. Remarkably, because these individual classifiers yield different Bayesian error distributions, their combination and selection of minimal-variance predictions can increase the accuracy of distinguishing drug-like from non-drug-like molecules to 93%. Because total variance is comparable with its aleatoric contribution reflecting irreducible error inherent to the dataset (as opposed to the epistemic contribution due to the model itself), this level of accuracy is probably the upper limit achievable with the currently known collection of drugs. When designing new drugs, there are countless ways to create molecules, yet only a few interact with biological targets. Beker and colleagues provide here a graph neural network based metric for drug-likeness that can guide the search.
引用
收藏
页码:457 / +
页数:16
相关论文
共 38 条
[1]   Announcing the worldwide Protein Data Bank [J].
Berman, H ;
Henrick, K ;
Nakamura, H .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) :980-980
[2]  
Bickerton GR, 2012, NAT CHEM, V4, P90, DOI [10.1038/NCHEM.1243, 10.1038/nchem.1243]
[3]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[4]   Organic Chemistry as a Language and the Implications of Chemical Linguistics for Structural and Retrosynthetic Analyses [J].
Cadeddu, Andrea ;
Wylie, Elizabeth K. ;
Jurczak, Janusz ;
Wampler-Doty, Matthew ;
Grzybowski, Bartosz A. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2014, 53 (31) :8108-8112
[5]  
Chao Chen.Andy Liaw. Leo Breiman., 2004, Using random forest to learn imbalanced data
[6]   A graph-convolutional neural network model for the prediction of chemical reactivity [J].
Coley, Connor W. ;
Jin, Wengong ;
Rogers, Luke ;
Jamison, Timothy F. ;
Jaakkola, Tommi S. ;
Green, William H. ;
Barzilay, Regina ;
Jensen, Klavs F. .
CHEMICAL SCIENCE, 2019, 10 (02) :370-377
[7]  
Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274
[8]  
Defferrard M, 2016, ADV NEUR IN, V29
[9]   How Beyond Rule of 5 Drugs and Clinical Candidates Bind to Their Targets [J].
Doak, Bradley C. ;
Zheng, Jie ;
Dobritzsch, Doreen ;
Kihlberg, Jan .
JOURNAL OF MEDICINAL CHEMISTRY, 2016, 59 (06) :2312-2327
[10]  
Duvenaudt D, 2015, ADV NEUR IN, V28