Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets

被引:234
作者
Korotcov, Alexandru [1 ]
Tkachenko, Valery [1 ]
Russo, Daniel P. [2 ,3 ]
Ekins, Sean [2 ]
机构
[1] Sci Data Software LLC, 14914 Bradwill Court, Rockville, MD 20850 USA
[2] Collaborat Pharmaceut Inc, 840 Main Campus Dr,Lab 3510, Raleigh, NC 27606 USA
[3] Rutgers Ctr Computat & Integrat Biol, Camden, NJ 08102 USA
关键词
deep learning; drug discovery; machine learning; pharmaceutics; support vector machine; IN-SILICO PHARMACOLOGY; ADMET EVALUATION; METABOLIC STABILITY; DISTRIBUTION VALUES; AQUEOUS SOLUBILITY; PREDICTION MODELS; SYSTEMS-ADME/TOX; NEURAL-NETWORKS; BAYESIAN MODELS; BASIC DRUGS;
D O I
10.1021/acs.molpharmaceut.7b00578
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohens kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.
引用
收藏
页码:4462 / 4475
页数:14
相关论文
共 121 条
[1]   Consensus Modeling for HTS Assays Using In silico Descriptors Calculates the Best Balanced Accuracy in Tox21 Challenge [J].
Abdelaziz, Ahmed ;
Spahn-Langguth, Hilde ;
Schramm, Karl-Werner ;
Tetko, Igor, V .
FRONTIERS IN ENVIRONMENTAL SCIENCE, 2016, 4
[2]   A deep convolutional neural network model to classify heartbeats [J].
Acharya, U. Rajendra ;
Oh, Shu Lih ;
Hagiwara, Yuki ;
Tan, Jen Hong ;
Adam, Muhammad ;
Gertych, Arkadiusz ;
Tan, Ru San .
COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 89 :389-396
[3]   AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images [J].
Albarqouni, Shadi ;
Baur, Christoph ;
Achilles, Felix ;
Belagiannis, Vasileios ;
Demirci, Stefanie ;
Navab, Nassir .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2016, 35 (05) :1313-1321
[4]   Low Data Drug Discovery with One-Shot Learning [J].
Altae-Tran, Han ;
Ramsundar, Bharath ;
Pappu, Aneesh S. ;
Pande, Vijay .
ACS CENTRAL SCIENCE, 2017, 3 (04) :283-293
[5]   High-throughput screening for inhibitors of Mycobacterium tuberculosis H37Rv [J].
Ananthan, Subramaniam ;
Faaleolea, Ellen R. ;
Goldman, Robert C. ;
Hobrath, Judith V. ;
Kwong, Cecil D. ;
Laughon, Barbara E. ;
Maddry, Joseph A. ;
Mehta, Alka ;
Rasmussen, Lynn ;
Reynolds, Robert C. ;
Secrist, John A., III ;
Shindo, Nice ;
Showe, Dustin N. ;
Sosa, Melinda I. ;
Suling, William J. ;
White, E. Lucile .
TUBERCULOSIS, 2009, 89 (05) :334-353
[6]   Deep learning for computational biology [J].
Angermueller, Christof ;
Parnamaa, Tanel ;
Parts, Leopold ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
[7]  
[Anonymous], NOVARTIS GNF MALARIA
[8]   Measurement of baseline toxicity and QSAR analysis of 50 non-polar and 58 polar narcotic chemicals for the alga Pseudokirchneriella subcapitata [J].
Aruoja, Villem ;
Moosus, Maikki ;
Kahru, Anne ;
Sihtmaee, Mariliis ;
Maran, Uko .
CHEMOSPHERE, 2014, 96 :23-32
[9]  
Azzaoui K, 2007, CHEMMEDCHEM, V2, P874, DOI 10.1002/cmdc.200700036
[10]  
Bahadduri PM, 2010, METHODS MOL BIOL, V637, P65, DOI 10.1007/978-1-60761-700-6_4