RegioML: predicting the regioselectivity of electrophilic aromatic substitution reactions using machine learning

被引:10
|
作者
Ree, Nicolai [1 ]
Goeller, Andreas H. [2 ]
Jensen, Jan H. [1 ]
机构
[1] Univ Copenhagen, Dept Chem, Univ Pk 5, DK-2100 Copenhagen O, Denmark
[2] Bayer AG, Pharmaceut, R&D, Computat Mol Design, D-42096 Wuppertal, Germany
来源
DIGITAL DISCOVERY | 2022年 / 1卷 / 02期
关键词
ROBUST; SITE;
D O I
10.1039/d1dd00032b
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with a light gradient boosting machine (LightGBM). The model is trained and tested on 21 201 bromination reactions with 101k reaction centers, which are split into training, test, and out-of-sample datasets with 58k, 15k, and 27k reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to that of the graph-based WLN method developed by Struble et al. (React. Chem. Eng., 2020, 5, 896-902) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (Nicolai Ree, Andreas H. Goller, Jan H. Jensen, J. Cheminf., 2021, 13, 10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data. We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions.
引用
收藏
页码:108 / 114
页数:7
相关论文
共 50 条
  • [1] Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions
    Kromann, Jimmy C.
    Jensen, Jan H.
    Kruszyk, Monika
    Jessing, Mikkel
    Jorgensen, Morten
    CHEMICAL SCIENCE, 2018, 9 (03) : 660 - 665
  • [2] Theoretical Study on the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Azulene
    Shiraz, Nader Zabarjad
    Sharifzadeh, Elaheh Sadat
    Koosha, Neda
    ACTA CHIMICA SLOVENICA, 2013, 60 (01) : 166 - S175
  • [3] Computational Methods to Predict the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Heteroaromatic Systems
    Kruszyk, Monika
    Jessing, Mikkel
    Kristensen, Jesper Langgaard
    Jorgensen, Morten
    JOURNAL OF ORGANIC CHEMISTRY, 2016, 81 (12): : 5128 - 5134
  • [4] The Bell-Evans-Polanyi Principle and the regioselectivity of electrophilic aromatic substitution reactions
    Wubbels, Gene G.
    TETRAHEDRON LETTERS, 2015, 56 (13) : 1716 - 1719
  • [5] ELECTROPHILIC AROMATIC SUBSTITUTION REACTIONS
    BERLINER, E
    PROGRESS IN PHYSICAL ORGANIC CHEMISTRY, 1964, 2 : 253 - 321
  • [6] Predicting Regioselectivity in Nucleophilic Aromatic Substitution
    Liljenberg, Magnus
    Brinck, Tore
    Herschend, Bjorn
    Rein, Tobias
    Tomasi, Simone
    Svensson, Mats
    JOURNAL OF ORGANIC CHEMISTRY, 2012, 77 (07): : 3262 - 3269
  • [7] Simple computational tools to predict the regioselectivity of electrophilic aromatic substitution of aromatic heterocycles
    Kruszyk, Monika
    Jessing, Mikkel
    Kristensen, Jesper
    Jorgenson, Morten
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [8] PREPARATION OF ELECTROPHILIC AROMATIC-SUBSTITUTION REACTIONS USING SONICATION
    REEVES, WP
    MCBRIDE, KK
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1992, 203 : 128 - CHED
  • [9] Green electrophilic aromatic substitution reactions using graphite as a catalyst
    Waghe, Anil Bhalchandra
    Ashley, David E.
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2012, 243
  • [10] Unveiling the regioselectivity in electrophilic aromatic substitution reactions of deactivated benzenes through molecular electron density theory
    Domingo, Luis R.
    Rios-Gutierrez, Mar
    Jose Aurell, Maria
    NEW JOURNAL OF CHEMISTRY, 2021, 45 (30) : 13626 - 13638