共 33 条
Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis
被引:11
作者:
Stienstra, Cailum M. K.
[1
]
Ieritano, Christian
[1
]
Haack, Alexander
[1
]
Hopkins, W. Scott
[1
,2
,3
]
机构:
[1] Univ Waterloo, Dept Chem, Waterloo, ON N2L 3G1, Canada
[2] Watermine Innovat, Waterloo, ON N0B 2T0, Canada
[3] Ctr Eye & Vis Res, Hong Kong 999077, Peoples R China
基金:
加拿大自然科学与工程研究理事会;
关键词:
DRUG-DELIVERY REVIEWS;
AQUEOUS SOLUBILITY;
LIQUID-CHROMATOGRAPHY;
PHYSICOCHEMICAL PROPERTIES;
ATMOSPHERIC-PRESSURE;
PREDICTION;
IONS;
MOLECULES;
D O I:
10.1021/acs.analchem.3c00921
中图分类号:
O65 [分析化学];
学科分类号:
070302 ;
081704 ;
摘要:
Aqueous solubility, log S, and the water-octanolpartition coefficient, log P, are physicochemicalproperties that are used to screen the viability of drug candidatesand to estimate mass transport in the environment. In this work, differentialmobility spectrometry (DMS) experiments performed in microsolvatingenvironments are used to train machine learning (ML) frameworks thatpredict the log S and log P of variousmolecule classes. In lieu of a consistent source of experimentallymeasured log S and log P values,the OPERA package was used to evaluate the aqueous solubility andhydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressorsand ensemble stacking to derive relationships with a high degree ofexplainability, as assessed via SHapley Additive exPlanations (SHAP)analysis. The DMS-based regression models returned scores of R (2) = 0.67 and RMSE = 1.03 & PLUSMN; 0.10 for log S predictions and R (2) = 0.67and RMSE = 1.20 & PLUSMN; 0.10 for log P after 5-foldrandom cross-validation. SHAP analysis reveals that the regressorsstrongly weighted gas-phase clustering in log P correlations.The addition of structural descriptors (e.g., # ofaromatic carbons) improved log S predictions to yieldRMSE = 0.84 & PLUSMN; 0.07 and R (2) = 0.78.Similarly, log P predictions using the same dataresulted in an RMSE of 0.83 & PLUSMN; 0.04 and R (2) = 0.84. The SHAP analysis of log P modelshighlights the need for additional experimental parameters describinghydrophobic interactions. These results were achieved with a smallerdataset (333 instances) and minimal structural correlation comparedto purely structure-based models, underscoring the value of employingDMS data in predictive models.
引用
收藏
页码:10309 / 10321
页数:13
相关论文
共 33 条