Benchmarking machine learning methods for modeling physical properties of ionic liquids

被引:40
作者
Baskin, Igor [1 ]
Epshtein, Alon [1 ]
Ein-Eli, Yair [1 ,2 ]
机构
[1] Technion Israel Inst Technol, Dept Mat Sci & Engn, IL-3200003 Haifa, Israel
[2] Technion Israel Inst Technol, Grand Technion Energy Program GTEP, IL-3200003 Haifa, Israel
关键词
Ionic liquids; Machine learning; Neural networks; QSPR; OCHEM; GROUP-CONTRIBUTION QSPRS; NEURAL-NETWORK; EXTENSIVE DATABASES; MELTING-POINT; PREDICTION; DESCRIPTORS; VISCOSITY; CHEMOINFORMATICS; CONDUCTIVITY; LANGUAGE;
D O I
10.1016/j.molliq.2022.118616
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The great importance of the ability to quantitatively predict the properties of ionic liquids (ILs) using quantitative structure-property relationships (QSPR) models necessitates the understanding of which modern machine learning (ML) methods in combination with which types of molecular representations are preferable to use for this purpose. To address this problem, a large-scale benchmarking study of QSPR models built by combining three traditional ML methods and neural networks with seven different architectures with five types of molecular representations (in the form of either numerical molecular descriptors or SMILES text strings) to predict six important physical properties of ILs (density, electrical conductance, melting point, refractive index, surface tension, and viscosity) was carried out. The datasets include from 407 to 1204 diverse ILs composed of various organic and inorganic ions. QSPR models for predicting the properties of ILs at eight different temperatures were built using multi-task learning. The best combinations of ML methods and molecular representations were identified for each of the properties. A unified ranking system was introduced to rank and prioritize different ML methods and molecular representations. It was shown in this study that on average: (i) nonlinear ML methods perform much better than linear ones, (ii) neural networks perform better than traditional ML methods, (iii) Transformers, which are actively used in natural language processing (NLP), perform better than other types of neural networks due to the advanced ability to analyze chemical structures of ILs encoded into SMILES text strings. A special "component-wise" cross-validation scheme was applied to assess how much the predictive performance deteriorates for the ILs composed of cations and anions that are not present in the dataset. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 84 条
[1]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[2]   Data analytics and deep learning in medicinal chemistry [J].
Bajorath, Juergen .
FUTURE MEDICINAL CHEMISTRY, 2018, 10 (13) :1541-1543
[3]  
Baskin I.I., 2017, Tutor. Chemoinf., P263, DOI DOI 10.1002/9781119161110.CH18
[4]   Building a chemical space based on fragment descriptors [J].
Baskin, Igor ;
Varnek, Alexandre .
COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2008, 11 (08) :661-668
[5]  
Baskin Igor, 2008, P1, DOI 10.1039/9781847558879-00001
[6]   The power of deep learning to ligand-based novel drug discovery [J].
Baskin, Igor I. .
EXPERT OPINION ON DRUG DISCOVERY, 2020, 15 (07) :755-764
[7]  
Baskin II, 2018, METHODS MOL BIOL, V1800, P119, DOI 10.1007/978-1-4939-7899-1_5
[8]   Artificial intelligence in synthetic chemistry: achievements and prospects [J].
Baskin, Igor I. ;
Madzhidov, Timur I. ;
Antipin, Igor S. ;
Varnek, Alexandre A. .
RUSSIAN CHEMICAL REVIEWS, 2017, 86 (11) :1127-1156
[9]   A renaissance of neural networks in drug discovery [J].
Baskin, Igor I. ;
Winkler, David ;
Tetko, Igor V. .
EXPERT OPINION ON DRUG DISCOVERY, 2016, 11 (08) :785-795
[10]   A neural device for searching direct correlations between structures and properties of chemical compounds [J].
Baskin, II ;
Palyulin, VA ;
Zefirov, NS .
JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1997, 37 (04) :715-721