Text-based representations with interpretable machine learning reveal structure-property relationships of polybenzenoid hydrocarbons

被引:12
作者
Fite, Shachar [1 ]
Wahab, Alexandra [2 ]
Paenurk, Eno [2 ]
Gross, Zeev [1 ]
Gershoni-Poranne, Renana [1 ]
机构
[1] Technion Israel Inst Technol, Schulich Fac Chem, IL-32000 Haifa, Israel
[2] Swiss Fed Inst Technol, Lab Organ Chem, Dept Chem & Appl Biosci, Zurich, Switzerland
关键词
machine learning; molecular design; molecular representation; polycyclic aromatic hydrocarbons; structure-property relationships; PROPOSED NOMENCLATURE; LOCAL AROMATICITY; CHEMICAL GRAPHS; BENZENOIDS; STABILITY; ENERGIES; RINGS;
D O I
10.1002/poc.4458
中图分类号
O62 [有机化学];
学科分类号
070303 ; 081704 ;
摘要
New tools are developed and applied to enable the use of interpretable machine learning for investigation of structure-property relationships in polybenzenoid hydrocarbons (PBHs). A textual molecular representation, which is based on the annulation sequence of PBHs, is shown to be of utility either in its textual form or as a basis for a curated feature vector. Both forms display interpretability exceeding those achievable by standard SMILES representation; and the former also has increased predictive accuracy. A recently developed model, CUSTODI, was applied for the first time as an interpretable model, identifying important structural features that impact various electronic molecular properties. The resulting insights not only validate several well-known "rules of thumb" of organic chemistry but also reveal new behaviors and influential structural motifs, thus providing guiding principles for rational design and fine-tuning of PBHs.
引用
收藏
页数:17
相关论文
共 58 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Abdel-Shafy Hussein I., 2016, Egyptian Journal of Petroleum, V25, P107, DOI 10.1016/j.ejpe.2015.03.011
[3]   Reduced HOMO-LUMO gap as an index of kinetic stability for polycyclic aromatic hydrocarbons [J].
Aihara, J .
JOURNAL OF PHYSICAL CHEMISTRY A, 1999, 103 (37) :7487-7495
[4]   Low-voltage organic thin-film transistors based on [n] phenacenes [J].
Al Ruzaiqi, Afra ;
Okamoto, Hideki ;
Kubozono, Yoshihiro ;
Zschieschang, Ute ;
Klauk, Hagen ;
Baran, Peter ;
Gleskova, Helena .
ORGANIC ELECTRONICS, 2019, 73 :286-291
[5]  
Alkharusi H., 2012, International Journal of Education, V4, P202, DOI [DOI 10.5296/IJE.V4I2.1962, 10.5296/ije.v4i2.1962, DOI 10.5296/IJE.V4I2]
[6]   Functionalized acenes and heteroacenes for organic electronics [J].
Anthony, John E. .
CHEMICAL REVIEWS, 2006, 106 (12) :5028-5048
[7]   QSPR for physical properties of cata-condensed benzenoids using two simple dualist-based descriptors [J].
Balaban, Alexandru T. ;
Pompe, Matevz .
JOURNAL OF PHYSICAL CHEMISTRY A, 2007, 111 (12) :2448-2454
[8]   CHALLENGING PROBLEMS INVOLVING BENZENOID POLYCYCLICS AND RELATED SYSTEMS [J].
BALABAN, AT .
PURE AND APPLIED CHEMISTRY, 1982, 54 (05) :1075-1096
[9]   CHEMICAL GRAPHS .7. PROPOSED NOMENCLATURE OF BRANCHED CATA-CONDENSED BENZENOID POLYCYCLIC HYDROCARBONS [J].
BALABAN, AT .
TETRAHEDRON, 1969, 25 (15) :2949-&
[10]   CHEMICAL GRAPHS .5. ENUMERATION AND PROPOSED NOMENCLATURE OF BENZENOID CATA-CONDENSED POLYCYCLIC AROMATIC HYDROCARBONS [J].
BALABAN, AT ;
HARARY, F .
TETRAHEDRON, 1968, 24 (06) :2505-&