Reformulating Reactivity Design for Data-Efficient Machine Learning

被引:3
作者
Lewis-Atwell, Toby [1 ,2 ]
Beechey, Daniel [2 ]
Simsek, Ozgur [2 ]
Grayson, Matthew N. [1 ]
机构
[1] Univ Bath, Dept Chem, Bath BA2 7AY, England
[2] Univ Bath, Dept Comp Sci, Bath BA2 7AY, England
基金
英国工程与自然科学研究理事会;
关键词
machine learning; activation barriers; catalystdesign; organic synthesis; data efficiency; REACTION BARRIERS; PREDICTION; ACTIVATION; CHEMISTRY;
D O I
10.1021/acscatal.3c02513
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Machine learning (ML) can deliver rapid and accurate reaction barrier predictions for use in rational reactivity design. However, model training requires large data sets of typically thousands or tens of thousands of barriers that are very expensive to obtain computationally or experimentally. Furthermore, bespoke data sets are required for each region of interest in reaction space as models typically struggle to generalize. We have therefore reformulated the ML barrier prediction problem toward a much more data-efficient process: finding a reaction from a prespecified set with a desired target value. Our reformulation enables the rapid selection of reactions with purpose-specific activation barriers, for example, in the design of reactivity and selectivity in synthesis, catalyst design, toxicology, and covalent drug discovery, requiring just tens of accurately measured barriers. Importantly, our reformulation does not require generalization beyond the domain of the data set at hand, and we show excellent results for the highly toxicologically and synthetically relevant data sets of aza-Michael addition and transition-metal-catalyzed dihydrogen activation, typically requiring less than 20 accurately measured density functional theory (DFT) barriers. Even for incomplete data sets of E2 and S(N)2 reactions, with high numbers of missing barriers (74% and 56% respectively), our chosen ML search method still requires significantly fewer data points than the hundreds or thousands needed for more conventional uses of ML to predict activation barriers. Finally, we include a case study in which we use our process to guide the optimization of the dihydrogen activation catalyst. Our approach was able to identify a reaction within 1 kcal mol(-1) of the target barrier by only having to run 12 DFT reaction barrier calculations, which illustrates the usage and real-world applicability of this reformulation for systems of high synthetic importance.
引用
收藏
页码:13506 / 13515
页数:10
相关论文
共 56 条
  • [1] [Anonymous], 2023, MACROMODEL
  • [2] [Anonymous], 2023, MAESTR
  • [3] Best practices in machine learning for chemistry comment
    Artrith, Nongnuch
    Butler, Keith T.
    Coudert, Francois-Xavier
    Han, Seungwu
    Isayev, Olexandr
    Jain, Anubhav
    Walsh, Aron
    [J]. NATURE CHEMISTRY, 2021, 13 (06) : 505 - 508
  • [4] Bachrach SM, 2008, ANNU REP PROG CHEM B, V104, P394, DOI 10.1039/b719311b
  • [5] Overcoming Selectivity Issues in Reversible Catalysis: A Transfer Hydrocyanation Exhibiting High Kinetic Control
    Bhawal, Benjamin N.
    Reisenbauer, Julia C.
    Ehinger, Christian
    Morandi, Bill
    [J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2020, 142 (25) : 10914 - 10920
  • [6] Data enhanced Hammett-equation: reaction barriers in chemical space
    Bragato, Marco
    von Rudorff, Guido Falk
    von Lilienfeld, O. Anatole
    [J]. CHEMICAL SCIENCE, 2020, 11 (43) : 11859 - 11868
  • [7] Machine learning for molecular and materials science
    Butler, Keith T.
    Davies, Daniel W.
    Cartwright, Hugh
    Isayev, Olexandr
    Walsh, Aron
    [J]. NATURE, 2018, 559 (7715) : 547 - 555
  • [8] Towards an accurate description of anharmonic infrared spectra in solution within the polarizable continuum model: Reaction field, cavity field and nonequilibrium effects
    Cappelli, Chiara
    Lipparini, Filippo
    Bloino, Julien
    Barone, Vincenzo
    [J]. JOURNAL OF CHEMICAL PHYSICS, 2011, 135 (10)
  • [9] A kinetic study on the para-fluoro-thiol reaction in view of its use in materials design
    Cavalli, Federica
    De Keer, Lies
    Huber, Birgit
    Van Steenberge, Paul H. M.
    D'hooge, Dagmar R.
    Barner, Leonie
    [J]. POLYMER CHEMISTRY, 2019, 10 (22) : 2781 - 2791
  • [10] Feasibility of Activation Energy Prediction of Gas-Phase Reactions by Machine Learning
    Choi, Sunghwan
    Kim, Yeonjoon
    Kim, Jin Woo
    Kim, Zeehyo
    Kim, Woo Youn
    [J]. CHEMISTRY-A EUROPEAN JOURNAL, 2018, 24 (47) : 12354 - 12358