ChemBERTa embeddings and ensemble learning for prediction of density and melting point of deep eutectic solvents with hybrid features

被引:0
作者
Wu, Ting [1 ]
Zhan, Peilin [2 ]
Chen, Wei [2 ]
Lin, Miaoqing [1 ]
Qiu, Quanyuan [1 ]
Hu, Yinan [1 ]
Song, Jiuhang [1 ]
Lin, Xiaoqing [1 ]
机构
[1] Guangdong Univ Technol, Sch Chem Engn & Light Ind, Guangdong Prov Key Lab Plant Resources Biorefinery, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China
[2] Guangdong Univ Technol, Sch Comp Sci & Technol, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep eutectic solvents; Melting point; Density; Ensemble learning; ChemBERTa; CHOLINE; TOXICITY; BIODEGRADABILITY; OPTIMIZATION; EXTRACTION;
D O I
10.1016/j.compchemeng.2025.109065
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep eutectic solvents (DESs) are sustainable alternatives to traditional solvents, but their structural complexity makes accurate prediction of melting points and densities challenging. This study utilizes ChemBERTa, a pretrained Transformer model, to extract high-dimensional embeddings from Simplified Molecular Input Line Entry System (SMILES) strings, effectively capturing complex molecular interactions and subtle structural features. Through feature importance analysis, we identified missing molecular information in the ChemBERTa embeddings and supplemented it with select physicochemical descriptors from RDKit, creating a feature set that enhances both interpretability and predictive accuracy. Optimized ensemble models, including ExtraTreesRegressor (ETR) and XGBRegressor (XGBR), are then applied to this enriched feature set, achieving notable improvements in prediction accuracy for DES melting point and density. Rigorous grid search and ten-fold crossvalidation ensure model robustness and generalizability. Experimental results confirm the effectiveness of this approach, underscoring the transformative role of pre-trained deep learning models in chemical informatics and supporting scalable, sustainable DESs design.
引用
收藏
页数:12
相关论文
共 73 条
  • [41] Machine Learning for Predicting and Optimizing Physicochemical Properties of Deep Eutectic Solvents: Review and Perspectives
    Lopez-Flores, Francisco Javier
    Ramirez-Marquez, Cesar
    Gonzalez-Campo, J. Betzabe
    Ponce-Ortega, Jose Maria
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2024, 64 (06) : 3103 - 3117
  • [42] The effect of descriptor choice in machine learning models for ionic liquid melting point prediction
    Low, Kaycee
    Kobayashi, Rika
    Izgorodina, Ekaterina I.
    [J]. JOURNAL OF CHEMICAL PHYSICS, 2020, 153 (10)
  • [43] Highly Selective Separation of Levulinic Acid from Bamboo Pulp Hydrolysates by Forming a Hydrophobic Deep Eutectic Solvent with Terpenoids
    Mai, Yinglin
    Yuan, Haotian
    Song, Jiuhang
    Liu, Jingke
    Zeng, Yueren
    Qiu, Quanyuan
    Lin, Xiaoqing
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2023, 62 (49) : 21324 - 21334
  • [44] Liquid-liquid extraction of levulinic acid from aqueous solutions using hydrophobic tri-n-octylamine/alcohol-based deep eutectic solvent
    Mai, Yinglin
    Xian, Xiaoling
    Hu, Lei
    Zhang, Xiaodong
    Zheng, Xiaojie
    Tao, Shunhui
    Lin, Xiaoqing
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2023, 54 : 248 - 256
  • [45] Viscosity of deep eutectic solvents: Predictive modeling with experimental validation
    Makarov, Dmitriy M.
    Kolker, Arkadiy M.
    [J]. FLUID PHASE EQUILIBRIA, 2025, 587
  • [46] Supramolecular deep eutectic solvents in extraction processes: a review
    Makos-Chelstowska, Patrycja
    Slupek, Edyta
    Fourmentin, Sophie
    Gebicki, Jacek
    [J]. ENVIRONMENTAL CHEMISTRY LETTERS, 2025, 23 (01) : 41 - 65
  • [47] A conceptual framework for understanding phase separation and addressing open questions and challenges
    Mittag, Tanja
    Pappu, Rohit, V
    [J]. MOLECULAR CELL, 2022, 82 (12) : 2201 - 2214
  • [48] Application of the Eotvos and Guggenheim empirical rules for predicting the density and surface tension of ionic liquids analogues
    Mjalli, Farouq S.
    Vakili-Nezhaad, Gholamreza
    Shahbaz, Kaveh
    AINashef, Inas M.
    [J]. THERMOCHIMICA ACTA, 2014, 575 : 40 - 44
  • [49] Graph neural networks for CO 2 solubility predictions in Deep Eutectic Solvents
    Morales, Gabriel Hernandez
    Medina, Edgar Ivan Sanchez
    Jimenez-Gutierrez, Arturo
    Zavala, Victor M.
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2024, 187
  • [50] DESignSolvents: an open platform for the search and prediction of the physicochemical properties of deep eutectic solvents
    Odegova, Valeria
    Lavrinenko, Anastasia
    Rakhmanov, Timur
    Sysuev, George
    Dmitrenko, Andrei
    Vinogradov, Vladimir
    [J]. GREEN CHEMISTRY, 2024, 26 (07) : 3958 - 3967