Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction

被引:37
作者
Panapitiya, Gihan [1 ]
Girard, Michael [1 ]
Hollas, Aaron [1 ]
Sepulveda, Jonathan [1 ]
Murugesan, Vijayakumar [1 ]
Wang, Wei [1 ]
Saldanha, Emily [1 ]
机构
[1] Pacific Northwest Natl Lab, Richland, WA 99352 USA
来源
ACS OMEGA | 2022年 / 7卷 / 18期
关键词
NEURAL-NETWORK; ORGANIC-MOLECULES; DRUG DISCOVERY; DIVERSE SET; DESCRIPTORS; CHALLENGE; SOLVATION; MODEL;
D O I
10.1021/acsomega.2c00642
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and threedimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
引用
收藏
页码:15695 / 15710
页数:16
相关论文
共 77 条
  • [1] NWChem: Past, present, and future
    Apra, E.
    Bylaska, E. J.
    de Jong, W. A.
    Govind, N.
    Kowalski, K.
    Straatsma, T. P.
    Valiev, M.
    van Dam, H. J. J.
    Alexeev, Y.
    Anchell, J.
    Anisimov, V
    Aquino, F. W.
    Atta-Fynn, R.
    Autschbach, J.
    Bauman, N. P.
    Becca, J. C.
    Bernholdt, D. E.
    Bhaskaran-Nair, K.
    Bogatko, S.
    Borowski, P.
    Boschen, J.
    Brabec, J.
    Bruner, A.
    Cauet, E.
    Chen, Y.
    Chuev, G. N.
    Cramer, C. J.
    Daily, J.
    Deegan, M. J. O.
    Dunning, T. H., Jr.
    Dupuis, M.
    Dyall, K. G.
    Fann, G., I
    Fischer, S. A.
    Fonari, A.
    Fruechtl, H.
    Gagliardi, L.
    Garza, J.
    Gawande, N.
    Ghosh, S.
    Glaesemann, K.
    Goetz, A. W.
    Hammond, J.
    Helms, V
    Hermes, E. D.
    Hirao, K.
    Hirata, S.
    Jacquelin, M.
    Jensen, L.
    Johnson, B. G.
    [J]. JOURNAL OF CHEMICAL PHYSICS, 2020, 152 (18)
  • [2] Randomized SMILES strings improve the quality of molecular generative models
    Arus-Pous, Josep
    Johansson, Simon Viet
    Prykhodko, Oleksii
    Bjerrum, Esben Jannik
    Tyrchan, Christian
    Reymond, Jean-Louis
    Chen, Hongming
    Engkvist, Ola
    [J]. JOURNAL OF CHEMINFORMATICS, 2019, 11 (01)
  • [3] Ballester PJ, 2007, J COMPUT CHEM, V28, P1711, DOI 10.1002/JCC.20681
  • [4] Battaglia P.W., 2018, Relational inductive biases, deep learning, and graph networks
  • [5] Bergstra J., 2013, PMLR, V28, P115, DOI 10.5555/3042817.3042832
  • [6] Memory-assisted reinforcement learning for diverse molecular de novo design
    Blaschke, Thomas
    Engkvist, Ola
    Bajorath, Juergen
    Chen, Hongming
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [7] NEURAL NETWORK STUDIES .1. ESTIMATION OF THE AQUEOUS SOLUBILITY OF ORGANIC-COMPOUNDS
    BODOR, N
    HARGET, A
    HUANG, MJ
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1991, 113 (25) : 9480 - 9483
  • [8] Machine learning with physicochemical relationships: solubility prediction in organic solvents and water
    Boobier, Samuel
    Hose, David R. J.
    Blacker, A. John
    Nguyen, Bao N.
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [9] Can human experts predict solubility better than computers?
    Boobier, Samuel
    Osbourn, Anne
    Mitchell, John B. O.
    [J]. JOURNAL OF CHEMINFORMATICS, 2017, 9
  • [10] Conformational Effects on Physical-Organic Descriptors: The Case of Sterimol Steric Parameters
    Brethome, Alexandre V.
    Fletcher, Stephen P.
    Paton, Robert S.
    [J]. ACS CATALYSIS, 2019, 9 (03) : 2313 - 2323