ChemBERTa embeddings and ensemble learning for prediction of density and melting point of deep eutectic solvents with hybrid features

被引：0

作者：

Wu, Ting ^{[1
]}

Zhan, Peilin ^{[2
]}

Chen, Wei ^{[2
]}

Lin, Miaoqing ^{[1
]}

Qiu, Quanyuan ^{[1
]}

Hu, Yinan ^{[1
]}

Song, Jiuhang ^{[1
]}

Lin, Xiaoqing ^{[1
]}

机构：

[1] Guangdong Univ Technol, Sch Chem Engn & Light Ind, Guangdong Prov Key Lab Plant Resources Biorefinery, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China

[2] Guangdong Univ Technol, Sch Comp Sci & Technol, 100 Waihuan Xi Rd, Guangzhou 510006, Peoples R China

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2025年 / 196卷

基金：

中国国家自然科学基金;

关键词：

Deep eutectic solvents; Melting point; Density; Ensemble learning; ChemBERTa; CHOLINE; TOXICITY; BIODEGRADABILITY; OPTIMIZATION; EXTRACTION;

D O I：

10.1016/j.compchemeng.2025.109065

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Deep eutectic solvents (DESs) are sustainable alternatives to traditional solvents, but their structural complexity makes accurate prediction of melting points and densities challenging. This study utilizes ChemBERTa, a pretrained Transformer model, to extract high-dimensional embeddings from Simplified Molecular Input Line Entry System (SMILES) strings, effectively capturing complex molecular interactions and subtle structural features. Through feature importance analysis, we identified missing molecular information in the ChemBERTa embeddings and supplemented it with select physicochemical descriptors from RDKit, creating a feature set that enhances both interpretability and predictive accuracy. Optimized ensemble models, including ExtraTreesRegressor (ETR) and XGBRegressor (XGBR), are then applied to this enriched feature set, achieving notable improvements in prediction accuracy for DES melting point and density. Rigorous grid search and ten-fold crossvalidation ensure model robustness and generalizability. Experimental results confirm the effectiveness of this approach, underscoring the transformative role of pre-trained deep learning models in chemical informatics and supporting scalable, sustainable DESs design.

引用

页数：12

共 73 条

[1] Machine-Learning-Assisted Design of Deep Eutectic Solvents Based on Uncovered Hydrogen Bond Patterns
Abbas, Usman L.
Zhang, Yuxuan
Tapia, Joseph
Md, Selim
Chen, Jin
Shi, Jian
Shao, Qing
[J]. ENGINEERING, 2024, 39 : 74 - 83
[2] Novel solvent properties of choline chloride/urea mixtures
Abbott, AP
Capper, G
Davies, DL
Rasheed, RK
Tambyrajah, V
[J]. CHEMICAL COMMUNICATIONS, 2003, (01) : 70 - 71
[3] Activity coefficient acquisition with thermodynamics-informed active learning for phase diagram construction
Abranches, Dinis O.
Maginn, Edward J.
Colon, Yamil J.
[J]. AICHE JOURNAL, 2023, 69 (08)
[4] Type V deep eutectic solvents: Design and applications
Abranches, Dinis O.
Coutinho, Joao A. P.
[J]. CURRENT OPINION IN GREEN AND SUSTAINABLE CHEMISTRY, 2022, 35
[5] Exploring the thermophysical properties of natural deep eutectic solvents for gas capture applications: a comprehensive review
Al-Bodour, Ahmad
Alomari, Noor
Gutierrez, Alberto
Aparicio, Santiago
Atilhan, Mert
[J]. GREEN CHEMICAL ENGINEERING, 2024, 5 (03) : 307 - 338
[6] Machine learning modeling of pavement performance and IRI prediction in flexible pavement
Alnaqbi, Ali
Zeiada, Waleed
Al-Khateeb, Ghazi G.
[J]. INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2024, 9 (10)
[7] Deep eutectic solvents for extraction of functional components from plant-based products: A promising approach
Bashir, Iqra
Dar, Aamir Hussain
Dash, Kshirod Kumar
Pandey, Vinay Kumar
Fayaz, Ufaq
Shams, Rafeeya
Srivastava, Shivangi
Singh, Rahul
[J]. SUSTAINABLE CHEMISTRY AND PHARMACY, 2023, 33
[8] An open source chemical structure curation pipeline using RDKit
Bento, A. Patricia
Hersey, Anne
Felix, Eloy
Landrum, Greg
Gaulton, Anna
Atkinson, Francis
Bellis, Louisa J.
De Veij, Marleen
Leach, Andrew R.
[J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
[9] A review of sustainable lignocellulose biorefining applying (natural) deep eutectic solvents (DESs) for separations, catalysis and enzymatic biotransformation processes
Bjelic, Ana
Hocevar, Brigita
Grilc, Miha
Novak, Uros
Likozar, Blaz
[J]. REVIEWS IN CHEMICAL ENGINEERING, 2022, 38 (03) : 243 - 272
[10] Deep-eutectic solvents playing multiple roles in the synthesis of polymers and related materials
Carriazo, Daniel
Concepcion Serrano, Maria
Concepcion Gutierrez, Maria
Luisa Ferrer, Maria
del Monte, Francisco
[J]. CHEMICAL SOCIETY REVIEWS, 2012, 41 (14) : 4996 - 5014

← 1 2 3 4 5 6 7 8 →