ChemBERTa embeddings and ensemble learning for prediction of density and melting point of deep eutectic solvents with hybrid features

被引:0
作者
Wu, Ting [1 ]
Zhan, Peilin [2 ]
Chen, Wei [2 ]
Lin, Miaoqing [1 ]
Qiu, Quanyuan [1 ]
Hu, Yinan [1 ]
Song, Jiuhang [1 ]
Lin, Xiaoqing [1 ]
机构
[1] Guangdong Univ Technol, Sch Chem Engn & Light Ind, Guangdong Prov Key Lab Plant Resources Biorefinery, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China
[2] Guangdong Univ Technol, Sch Comp Sci & Technol, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep eutectic solvents; Melting point; Density; Ensemble learning; ChemBERTa; CHOLINE; TOXICITY; BIODEGRADABILITY; OPTIMIZATION; EXTRACTION;
D O I
10.1016/j.compchemeng.2025.109065
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep eutectic solvents (DESs) are sustainable alternatives to traditional solvents, but their structural complexity makes accurate prediction of melting points and densities challenging. This study utilizes ChemBERTa, a pretrained Transformer model, to extract high-dimensional embeddings from Simplified Molecular Input Line Entry System (SMILES) strings, effectively capturing complex molecular interactions and subtle structural features. Through feature importance analysis, we identified missing molecular information in the ChemBERTa embeddings and supplemented it with select physicochemical descriptors from RDKit, creating a feature set that enhances both interpretability and predictive accuracy. Optimized ensemble models, including ExtraTreesRegressor (ETR) and XGBRegressor (XGBR), are then applied to this enriched feature set, achieving notable improvements in prediction accuracy for DES melting point and density. Rigorous grid search and ten-fold crossvalidation ensure model robustness and generalizability. Experimental results confirm the effectiveness of this approach, underscoring the transformative role of pre-trained deep learning models in chemical informatics and supporting scalable, sustainable DESs design.
引用
收藏
页数:12
相关论文
共 73 条
  • [1] Machine-Learning-Assisted Design of Deep Eutectic Solvents Based on Uncovered Hydrogen Bond Patterns
    Abbas, Usman L.
    Zhang, Yuxuan
    Tapia, Joseph
    Md, Selim
    Chen, Jin
    Shi, Jian
    Shao, Qing
    [J]. ENGINEERING, 2024, 39 : 74 - 83
  • [2] Novel solvent properties of choline chloride/urea mixtures
    Abbott, AP
    Capper, G
    Davies, DL
    Rasheed, RK
    Tambyrajah, V
    [J]. CHEMICAL COMMUNICATIONS, 2003, (01) : 70 - 71
  • [3] Activity coefficient acquisition with thermodynamics-informed active learning for phase diagram construction
    Abranches, Dinis O.
    Maginn, Edward J.
    Colon, Yamil J.
    [J]. AICHE JOURNAL, 2023, 69 (08)
  • [4] Type V deep eutectic solvents: Design and applications
    Abranches, Dinis O.
    Coutinho, Joao A. P.
    [J]. CURRENT OPINION IN GREEN AND SUSTAINABLE CHEMISTRY, 2022, 35
  • [5] Exploring the thermophysical properties of natural deep eutectic solvents for gas capture applications: a comprehensive review
    Al-Bodour, Ahmad
    Alomari, Noor
    Gutierrez, Alberto
    Aparicio, Santiago
    Atilhan, Mert
    [J]. GREEN CHEMICAL ENGINEERING, 2024, 5 (03) : 307 - 338
  • [6] Machine learning modeling of pavement performance and IRI prediction in flexible pavement
    Alnaqbi, Ali
    Zeiada, Waleed
    Al-Khateeb, Ghazi G.
    [J]. INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2024, 9 (10)
  • [7] Deep eutectic solvents for extraction of functional components from plant-based products: A promising approach
    Bashir, Iqra
    Dar, Aamir Hussain
    Dash, Kshirod Kumar
    Pandey, Vinay Kumar
    Fayaz, Ufaq
    Shams, Rafeeya
    Srivastava, Shivangi
    Singh, Rahul
    [J]. SUSTAINABLE CHEMISTRY AND PHARMACY, 2023, 33
  • [8] An open source chemical structure curation pipeline using RDKit
    Bento, A. Patricia
    Hersey, Anne
    Felix, Eloy
    Landrum, Greg
    Gaulton, Anna
    Atkinson, Francis
    Bellis, Louisa J.
    De Veij, Marleen
    Leach, Andrew R.
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [9] A review of sustainable lignocellulose biorefining applying (natural) deep eutectic solvents (DESs) for separations, catalysis and enzymatic biotransformation processes
    Bjelic, Ana
    Hocevar, Brigita
    Grilc, Miha
    Novak, Uros
    Likozar, Blaz
    [J]. REVIEWS IN CHEMICAL ENGINEERING, 2022, 38 (03) : 243 - 272
  • [10] Deep-eutectic solvents playing multiple roles in the synthesis of polymers and related materials
    Carriazo, Daniel
    Concepcion Serrano, Maria
    Concepcion Gutierrez, Maria
    Luisa Ferrer, Maria
    del Monte, Francisco
    [J]. CHEMICAL SOCIETY REVIEWS, 2012, 41 (14) : 4996 - 5014