Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning

被引:6
作者
Bao, Zeqing [1 ]
Tom, Gary [2 ,3 ,4 ]
Cheng, Austin [2 ,3 ,4 ]
Watchorn, Jeffrey [5 ]
Aspuru-Guzik, Alan [2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ]
Allen, Christine [1 ,5 ,7 ]
机构
[1] Univ Toronto, Leslie Dan Fac Pharm, Toronto, ON M5S 3M2, Canada
[2] Univ Toronto, Dept Chem, Toronto, ON M5S 3H6, Canada
[3] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
[4] Vector Inst Artificial Intelligence, Toronto, ON M5S 1M1, Canada
[5] Accelerat Consortium, Toronto, ON M5S 3H6, Canada
[6] Canadian Inst Adv Res CIFAR, Toronto, ON M5S 1M1, Canada
[7] Univ Toronto, Dept Chem Engn & Appl Chem, Toronto, ON M5S 3E5, Canada
[8] Univ Toronto, Dept Mat Sci & Engn, Toronto, ON M5S 3E4, Canada
[9] Vector Inst, CIFAR Artificial Intelligence Res Chair, Toronto, ON M5S 1M1, Canada
关键词
AQUEOUS SOLUBILITY; ORGANIC-COMPOUNDS; NEURAL-NETWORK; MELTING-POINTS; DISCOVERY; CHEMISTRY; MODELS; QSPR; 2D;
D O I
10.1186/s13321-024-00911-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities.To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of aqueous solubility and/or solubility at specific temperatures, which restricts the model applicability in pharmaceutical development. To bridge this gap, we compiled a dataset of 27,000 solubility datapoints, including solubility of small molecules measured in a range of binary solvent mixtures under various temperatures. Next, a panel of ML models were trained on this dataset with their hyperparameters tuned using Bayesian optimization. The resulting top-performing models, both gradient boosted decision trees (light gradient boosting machine and extreme gradient boosting), achieved mean absolute errors (MAE) of 0.33 for LogS (S in g/100 g) on the holdout set. These models were further validated through a prospective study, wherein the solubility of four drug molecules were predicted by the models and then validated with in-house solubility experiments. This prospective study demonstrated that the models accurately predicted the solubility of solutes in specific binary solvent mixtures under different temperatures, especially for drugs whose features closely align within the solutes in the dataset (MAE < 0.5 for LogS).
引用
收藏
页数:17
相关论文
共 147 条
[1]   Attention-Based Graph Neural Network for Molecular Solubility Prediction [J].
Ahmad, Waciar ;
Tayara, Hilal ;
Chong, Kil To .
ACS OMEGA, 2023, 8 (03) :3236-3244
[2]   A review on solubility enhancement methods for poorly water-soluble drugs [J].
Ainurofiq, Ahmad ;
Putro, David Sarono ;
Ramadhani, Dhea Aqila ;
Putra, Gemala Mahendra ;
Santo, Laura Da Costa Do Espirito .
JOURNAL OF REPORTS IN PHARMACEUTICAL SCIENCES, 2021, 10 (01) :137-147
[3]   High throughput solubility measurement in drug discovery and development [J].
Alsenz, Jochem ;
Kansy, Manfred .
ADVANCED DRUG DELIVERY REVIEWS, 2007, 59 (07) :546-567
[4]   Measurement and Correlation for Solubility of Moroxydine Hydrochloride in Pure and Binary Solvents [J].
An, Mengyao ;
Yi, Dengjing ;
Qu, Jingxuan ;
Liu, Haoyou ;
Hu, Shen ;
Han, Jiaming ;
Guo, Ying ;
Huang, Haishuang ;
He, Hui ;
Wang, Peng .
JOURNAL OF CHEMICAL AND ENGINEERING DATA, 2020, 65 (05) :2611-2618
[5]  
[Anonymous], Guidechem chemical B2B network
[6]  
[Anonymous], Chemspider: Search and Share Chemistry
[7]  
[Anonymous], 2024, Drug Delivery
[8]  
[Anonymous], 2024, Main Page, Wikipedia, the Free Encyclopedia.
[9]  
[Anonymous], PubChem
[10]  
[Anonymous], LAB EQUIPMENT LAB SU