Machine learning approach to predict Hansen solubility parameters of cocrystal coformers via integrating group contribution and COSMO-RS

被引:3
作者
Li, Chunrong [1 ]
Li, Zongqi [1 ]
Liu, Xinyan [1 ]
Xu, Jikun [1 ]
Zhang, Chuntao [1 ]
机构
[1] Wuhan Univ Sci & Technol, Sch Chem & Chem Engn, Wuhan 430081, Peoples R China
基金
中国国家自然科学基金;
关键词
Machine Learning; Hansen solubility parameters (HSPs); The Conductor-like Screening Model for Real; Solvents (COSMO-RS); Group Contribution (GC); SHapley Additive exPlanation (SHAP); MODELS; SOLVENTS; DESIGN;
D O I
10.1016/j.molliq.2024.125319
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Hansen Solubility Parameters (HSPs) have been a hot topic on predicting the tendency of pharmaceutical cocrystals formation and cocrystal coformers (CCFs) screening. However, the limitation of such application is the lack of models to accurately predict the values of HSPs for drug CCFs with more structural complexity. Accordingly, three ML (machine learning) models, i.e. ANN (Artificial Neural Network), XGBoostRegressor (Extreme Gradient Boosting Regressor) and LGBMRegressor (Light Gradient Boosting Machine Regressor), were developed for predicting the HSPs on CCFs screening for drugs. The HSPs database for 181 CCFs (containing alcohols, alkenes, aromatics, haloalkanes, amines, ketones, ethers, amides, esters, pharmaceuticals, alkanes, acids, nitroalkanes) were established and classified into the training set (140 compounds) and the test set (41 compounds with various functional polarity and groups, covering solid reagents and solvents). The prediction molecular descriptors were combined from the GC (Group Contribution) methods, the COSMO-RS (the Conductor-like Screening Model for Real Solvents) sigma-moments and energy descriptors. The results showed that ANN and XGBoostRegressor beat out LGBMRegressor in predicting HSPs for CCFs. Finally, SHapley Additive exPlanations (SHAP) was employed to visualize and explain the most important characteristics and effects on predicting HSPs via XGBoostRegressor, indicating that CH3, M2 and MHbdon3 had a significant influence and high contribution to the prediction of delta d, delta p and delta h, respectively. The coupled GC and COSMO-RS strategy had been proven as a promising tool to predict HSPs through XGBoostRegressor for screening, designing, and selecting CCFs for drugs.
引用
收藏
页数:14
相关论文
共 58 条
  • [1] Physics-Informed Neural Networks with Group Contribution Methods
    Babaei, Mohammad Reza
    Stone, Ryan
    Knotts, Thomas Allen
    Hedengren, John
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2023, 19 (13) : 4163 - 4171
  • [2] Role of Hansen solubility parameters in solid phase extraction
    Bielicka-Daszkiewicz, K.
    Voelkel, A.
    Pietrzynska, M.
    Heberger, K.
    [J]. JOURNAL OF CHROMATOGRAPHY A, 2010, 1217 (35) : 5564 - 5570
  • [3] Crystal Engineering of Pharmaceutical Cocrystals in the Discovery and Development of Improved Drugs
    Bolla, Geetha
    Sarma, Bipul
    Nangia, Ashwini K.
    [J]. CHEMICAL REVIEWS, 2022, 122 (13) : 11514 - 11603
  • [4] Brandrup J., 1999, POLYM HDB
  • [5] Memristor-based neural networks: a bridge from device to artificial intelligence
    Cao, Zelin
    Sun, Bai
    Zhou, Guangdong
    Mao, Shuangsuo
    Zhu, Shouhui
    Zhang, Jie
    Ke, Chuan
    Zhao, Yong
    Shao, Jinyou
    [J]. NANOSCALE HORIZONS, 2023, 8 (06) : 716 - 745
  • [6] Tunable Surface Area, Porosity, and Function in Conjugated Microporous Polymers
    Chen, Jie
    Yan, Wei
    Townsend, Esther J.
    Feng, Jiangtao
    Pan, Long
    Hernandez, Veronica Del Angel
    Faul, Charl F. J.
    [J]. ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2019, 58 (34) : 11715 - 11719
  • [7] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [8] Artificial neural network modeling on the polymer-electrolyte aqueous two-phase systems involving biomolecules
    Chen, Yuqiu
    Liang, Xiaodong
    Kontogeorgis, Georgios M.
    [J]. SEPARATION AND PURIFICATION TECHNOLOGY, 2023, 306
  • [9] Hansen solubility parameters for selection of green extraction solvents
    del Pilar Sanchez-Camargo, Andrea
    Bueno, Monica
    Parada-Alfonso, Fabian
    Cifuentes, Alejandro
    Ibanez, Elena
    [J]. TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 2019, 118 : 227 - 237
  • [10] XGBoost algorithm-based prediction of concrete electrical resistivity for structural health monitoring
    Dong, Wei
    Huang, Yimiao
    Lehane, Barry
    Ma, Guowei
    [J]. AUTOMATION IN CONSTRUCTION, 2020, 114