Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis

被引:11
作者
Stienstra, Cailum M. K. [1 ]
Ieritano, Christian [1 ]
Haack, Alexander [1 ]
Hopkins, W. Scott [1 ,2 ,3 ]
机构
[1] Univ Waterloo, Dept Chem, Waterloo, ON N2L 3G1, Canada
[2] Watermine Innovat, Waterloo, ON N0B 2T0, Canada
[3] Ctr Eye & Vis Res, Hong Kong 999077, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
DRUG-DELIVERY REVIEWS; AQUEOUS SOLUBILITY; LIQUID-CHROMATOGRAPHY; PHYSICOCHEMICAL PROPERTIES; ATMOSPHERIC-PRESSURE; PREDICTION; IONS; MOLECULES;
D O I
10.1021/acs.analchem.3c00921
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Aqueous solubility, log S, and the water-octanolpartition coefficient, log P, are physicochemicalproperties that are used to screen the viability of drug candidatesand to estimate mass transport in the environment. In this work, differentialmobility spectrometry (DMS) experiments performed in microsolvatingenvironments are used to train machine learning (ML) frameworks thatpredict the log S and log P of variousmolecule classes. In lieu of a consistent source of experimentallymeasured log S and log P values,the OPERA package was used to evaluate the aqueous solubility andhydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressorsand ensemble stacking to derive relationships with a high degree ofexplainability, as assessed via SHapley Additive exPlanations (SHAP)analysis. The DMS-based regression models returned scores of R (2) = 0.67 and RMSE = 1.03 & PLUSMN; 0.10 for log S predictions and R (2) = 0.67and RMSE = 1.20 & PLUSMN; 0.10 for log P after 5-foldrandom cross-validation. SHAP analysis reveals that the regressorsstrongly weighted gas-phase clustering in log P correlations.The addition of structural descriptors (e.g., # ofaromatic carbons) improved log S predictions to yieldRMSE = 0.84 & PLUSMN; 0.07 and R (2) = 0.78.Similarly, log P predictions using the same dataresulted in an RMSE of 0.83 & PLUSMN; 0.04 and R (2) = 0.84. The SHAP analysis of log P modelshighlights the need for additional experimental parameters describinghydrophobic interactions. These results were achieved with a smallerdataset (333 instances) and minimal structural correlation comparedto purely structure-based models, underscoring the value of employingDMS data in predictive models.
引用
收藏
页码:10309 / 10321
页数:13
相关论文
共 33 条
  • [31] Electrocardiogram sampling frequency for the optimal performance of complexity analysis and machine learning models: Discrimination between patients with and without paroxysmal atrial fi brillation using sinus rhythm electrocardiograms
    Creasy, Steven
    Alexeenko, Vadim
    Lip, Gregory Y. H.
    Tse, Gary
    Aston, Philip J.
    Jeevaratnam, Kamalan
    HEART RHYTHM O2, 2025, 6 (01): : 48 - 57
  • [32] Evaluation of Durability Performance for Chloride Ingress Considering Long-Term Aged GGBFS and FA Concrete and Analysis of the Relationship between Concrete Mixture Characteristic and Passed Charge Using Machine Learning Algorithm
    Yoon, Yong-Sik
    Kwon, Seung-Jun
    Kim, Kyong-Chul
    Kim, Youngseok
    Koh, Kyung-Taek
    Choi, Won-Young
    Lim, Kwang-Mo
    MATERIALS, 2023, 16 (23)
  • [33] Machine Learning Analysis of Factors Influencing Pediatric Telehealth Visits During COVID-19: A State-Level Comparison Using 2021-22 National Survey of Children's Health Data
    Lee, Yu-Sheng
    Shrestha, Junu
    Sprong, Matthew Evan
    Huang, Xueli
    Tuladhar, Sushil
    Chuang, Michael Y.
    HEALTHCARE, 2024, 12 (21)