Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis

被引:10
|
作者
Stienstra, Cailum M. K. [1 ]
Ieritano, Christian [1 ]
Haack, Alexander [1 ]
Hopkins, W. Scott [1 ,2 ,3 ]
机构
[1] Univ Waterloo, Dept Chem, Waterloo, ON N2L 3G1, Canada
[2] Watermine Innovat, Waterloo, ON N0B 2T0, Canada
[3] Ctr Eye & Vis Res, Hong Kong 999077, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
DRUG-DELIVERY REVIEWS; AQUEOUS SOLUBILITY; LIQUID-CHROMATOGRAPHY; PHYSICOCHEMICAL PROPERTIES; ATMOSPHERIC-PRESSURE; PREDICTION; IONS; MOLECULES;
D O I
10.1021/acs.analchem.3c00921
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Aqueous solubility, log S, and the water-octanolpartition coefficient, log P, are physicochemicalproperties that are used to screen the viability of drug candidatesand to estimate mass transport in the environment. In this work, differentialmobility spectrometry (DMS) experiments performed in microsolvatingenvironments are used to train machine learning (ML) frameworks thatpredict the log S and log P of variousmolecule classes. In lieu of a consistent source of experimentallymeasured log S and log P values,the OPERA package was used to evaluate the aqueous solubility andhydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressorsand ensemble stacking to derive relationships with a high degree ofexplainability, as assessed via SHapley Additive exPlanations (SHAP)analysis. The DMS-based regression models returned scores of R (2) = 0.67 and RMSE = 1.03 & PLUSMN; 0.10 for log S predictions and R (2) = 0.67and RMSE = 1.20 & PLUSMN; 0.10 for log P after 5-foldrandom cross-validation. SHAP analysis reveals that the regressorsstrongly weighted gas-phase clustering in log P correlations.The addition of structural descriptors (e.g., # ofaromatic carbons) improved log S predictions to yieldRMSE = 0.84 & PLUSMN; 0.07 and R (2) = 0.78.Similarly, log P predictions using the same dataresulted in an RMSE of 0.83 & PLUSMN; 0.04 and R (2) = 0.84. The SHAP analysis of log P modelshighlights the need for additional experimental parameters describinghydrophobic interactions. These results were achieved with a smallerdataset (333 instances) and minimal structural correlation comparedto purely structure-based models, underscoring the value of employingDMS data in predictive models.
引用
收藏
页码:10309 / 10321
页数:13
相关论文
共 33 条
  • [21] Integrative gene expression analysis for the diagnosis of Parkinson's disease using machine learning and explainable AI
    Bhandari, Nikita
    Walambe, Rahee
    Kotecha, Ketan
    Kaliya, Mehul
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 163
  • [22] Using machine learning to investigate the determinants of loan default in P2P lending: Are there differences between before and during COVID-19?
    Xu, Qi
    Liu, Caixia
    Luo, Jing
    Liu, Feng
    PACIFIC-BASIN FINANCE JOURNAL, 2024, 88
  • [23] Using Machine Learning With Partial Dependence Analysis to Investigate Coupling Between Soil Moisture and Near-Surface Temperature
    Trok, Jared T. T.
    Davenport, Frances V. V.
    Barnes, Elizabeth A. A.
    Diffenbaugh, Noah S. S.
    JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2023, 128 (12)
  • [24] Correlation between wearable inertial sensor data and standardised Parkinson's disease axial impairment measures using machine learning
    Borzi', Luigi
    Manoni, Alessandro
    Zampogna, Alessandro
    Irrera, Fernanda
    Suppa, Antonio
    Olmo, Gabriella
    2022 IEEE 21ST MEDITERRANEAN ELECTROTECHNICAL CONFERENCE (IEEE MELECON 2022), 2022, : 732 - 736
  • [25] Exploring the Association between Pro-Inflammation and the Early Diagnosis of Alzheimer's Disease in Buccal Cells Using Immunocytochemistry and Machine Learning Techniques
    Lazaros, Konstantinos
    Gonidi, Maria
    Kontara, Nafsika
    Krokidis, Marios G.
    Vrahatis, Aristidis G.
    Exarchos, Themis
    Vlamos, Panagiotis
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [26] The Relationship Between Bangladesh's Financial Development, Exchange Rates, and Stock Market Capitalization: An Empirical Study Using the NARDL Model and Machine Learning
    Parvin, Rehana
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2022, 30 (04): : 2493 - 2508
  • [27] Clustering and prediction of disease progression trajectories in Huntington's disease: An analysis of Enroll-HD data using a machine learning approach
    Ko, Jinnie
    Furby, Hannah
    Ma, Xiaoye
    Long, Jeffrey D.
    Lu, Xiao-Yu
    Slowiejko, Diana
    Gandhy, Rita
    FRONTIERS IN NEUROLOGY, 2023, 13
  • [28] Alzheimer's Disease Detection Using Extreme Learning Machine, Complex Dual Tree Wavelet Principal Coefficients and Linear Discriminant Analysis
    Jha, Debesh
    Alam, Saruar
    Pyun, Jae-Young
    Lee, Kun Ho
    Kwon, Goo-Rak
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2018, 8 (05) : 881 - 890
  • [29] Forecasting the S&P 500 Index Using Mathematical-Based Sentiment Analysis and Deep Learning Models: A FinBERT Transformer Model and LSTM
    Kim, Jihwan
    Kim, Hui-Sang
    Choi, Sun-Yong
    AXIOMS, 2023, 12 (09)
  • [30] Hybrid multivariate pattern analysis combined with extreme learning machine for Alzheimer's dementia diagnosis using multi-measure rs-fMRI spatial patterns
    Duc Thanh Nguyen
    Ryu, Seungjun
    Qureshi, Muhammad Naveed Iqbal
    Choi, Min
    Lee, Kun Ho
    Lee, Boreom
    PLOS ONE, 2019, 14 (02):