Reagent prediction with a molecular transformer improves reaction data quality

被引:14
作者
Andronov, Mikhail [1 ,5 ]
Voinarovska, Varvara [2 ]
Andronova, Natalia [3 ]
Wand, Michael [1 ,4 ]
Clevert, Djork-Arne [5 ]
Schmidhuber, Jurgen [6 ]
机构
[1] IDSIA USI SUPSI, CH-6900 Lugano, Switzerland
[2] Helmholtz Munich Deutsch Forschungszentrum Julich, Inst Struct Biol, Mol Targets & Therapeut Ctr, D-85764 Neuherberg, Germany
[3] Via Berna 9, CH-6900 Lugano, Switzerland
[4] Inst Digital Technol Personalized Healthcare, SUPSI, CH-6900 Lugano, Switzerland
[5] Pfizer Worldwide Res Dev & Med, Machine Learning Res, Link str10, Berlin, Germany
[6] KAUST, AI Initiat, Thuwal 23955, Saudi Arabia
关键词
ORGANIC-CHEMISTRY; CLASSIFICATION; COMPUTER; MODEL; METHODOLOGY; SYSTEM;
D O I
10.1039/d2sc06798f
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Automated synthesis planning is key for efficient generative chemistry. Since reactions of given reactants may yield different products depending on conditions such as the chemical context imposed by specific reagents, computer-aided synthesis planning should benefit from recommendations of reaction conditions. Traditional synthesis planning software, however, typically proposes reactions without specifying such conditions, relying on human organic chemists who know the conditions to carry out suggested reactions. In particular, reagent prediction for arbitrary reactions, a crucial aspect of condition recommendation, has been largely overlooked in cheminformatics until recently. Here we employ the Molecular Transformer, a state-of-the-art model for reaction prediction and single-step retrosynthesis, to tackle this problem. We train the model on the US patents dataset (USPTO) and test it on Reaxys to demonstrate its out-of-distribution generalization capabilities. Our reagent prediction model also improves the quality of product prediction: the Molecular Transformer is able to substitute the reagents in the noisy USPTO data with reagents that enable product prediction models to outperform those trained on plain USPTO. This makes it possible to improve upon the state-of-the-art in reaction product prediction on the USPTO MIT benchmark.
引用
收藏
页码:3235 / 3246
页数:12
相关论文
共 51 条
  • [1] Prediction of Optimal Conditions of Hydrogenation Reaction Using the Likelihood Ranking Approach
    Afonina, Valentina A.
    Mazitov, Daniyar A.
    Nurmukhametova, Albina
    Shevelev, Maxim D.
    Khasanova, Dina A.
    Nugmanov, Ramil I.
    Burilov, Vladimir A.
    Madzhidov, Timur I.
    Varnek, Alexandre
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (01)
  • [2] Exploring Chemical Reaction Space with Reaction Difference Fingerprints and Parametric t-SNE
    Andronov, Mikhail
    Fedorov, Maxim, V
    Sosnin, Sergey
    [J]. ACS OMEGA, 2021, 6 (45): : 30743 - 30751
  • [3] Closed-loop optimization of general reaction conditions for heteroaryl Suzuki-Miyaura coupling
    Angello, Nicholas H.
    Rathore, Vandana
    Beker, Wiktor
    Wolos, Agnieszka
    Jira, Edward R.
    Roszak, Rafal
    Wu, Tony C.
    Schroeder, Charles M.
    Aspuru-Guzik, Alan
    Grzybowski, Bartosz A.
    Burke, Martin D.
    [J]. SCIENCE, 2022, 378 (6618) : 399 - 405
  • [4] Bi H., 2021, Proc. 38th Int. Conf. Mach.Learn, P904
  • [5] Bjerrum E, 2021, ChemRxiv, DOI [10.26434/chemrxiv-2021-kzhbs, DOI 10.26434/CHEMRXIV-2021-KZHBS]
  • [6] A graph-convolutional neural network model for the prediction of chemical reactivity
    Coley, Connor W.
    Jin, Wengong
    Rogers, Luke
    Jamison, Timothy F.
    Jaakkola, Tommi S.
    Green, William H.
    Barzilay, Regina
    Jensen, Klavs F.
    [J]. CHEMICAL SCIENCE, 2019, 10 (02) : 370 - 377
  • [7] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [8] Approximate statistical tests for comparing supervised classification learning algorithms
    Dietterich, TG
    [J]. NEURAL COMPUTATION, 1998, 10 (07) : 1895 - 1923
  • [9] Graph Transformation Policy Network for Chemical Reaction Prediction
    Do, Kien
    Truyen Tran
    Venkatesh, Svetha
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 750 - 760
  • [10] figshare, Chemical reactions from US patents (1976-Sep2016)