Predicting chemical reaction outcomes: A grammar ontology-based transformer framework

被引:35
作者
Mann, Vipul [1 ]
Venkatasubramanian, Venkat [1 ]
机构
[1] Columbia Univ, Dept Chem Engn, New York, NY 10027 USA
关键词
artificial intelligence; computational chemistry (reaction modeling); context-free grammar; computational chemistry (organic reactions); neural machine translation; DESIGN; MODELS;
D O I
10.1002/aic.17190
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Discovering and designing novel materials is a challenging problem as it often requires searching through a combinatorially large space of potential candidates, typically requiring great amounts of effort, time, expertise, and money. The ability to predict reaction outcomes without performing extensive experiments is, therefore, important. Toward that goal, we report an approach that uses context-free grammar-based representations of molecules in a neural machine translation framework. This involves discovering the transformations from the source sequence (comprising the reactants and agents) to the target sequence (comprising the major product) in the reaction. The grammar ontology-based representation hierarchically incorporates rich molecular-structure information, ensures syntactic validity of predictions, and overcomes over-parameterization in complex machine learning architectures. We achieve an accuracy of 80.1% (86.3% top-2 accuracy) and 99% syntactic validity of predictions on a standard reaction dataset. Moreover, our model is characterized by only a fraction of the number of training parameters used in other similar works in this area.
引用
收藏
页数:13
相关论文
共 42 条
  • [1] [Anonymous], 2012, Extraction of chemical structures and reactions from the literature
  • [2] [Anonymous], 2017, Grammar Variational Autoencoder
  • [3] Bahdanau D., 2014, ARXIV PREPRINT ARXIV
  • [4] Optimization in polymer design using connectivity indices
    Camarda, KV
    Maranas, CD
    [J]. INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 1999, 38 (05) : 1884 - 1892
  • [5] CHOMSKY N, 1956, IRE T INFORM THEOR, V2, P113
  • [6] A graph-convolutional neural network model for the prediction of chemical reactivity
    Coley, Connor W.
    Jin, Wengong
    Rogers, Luke
    Jamison, Timothy F.
    Jaakkola, Tommi S.
    Green, William H.
    Barzilay, Regina
    Jensen, Klavs F.
    [J]. CHEMICAL SCIENCE, 2019, 10 (02) : 370 - 377
  • [7] Prediction of Organic Reaction Outcomes Using Machine Learning
    Coley, Connor W.
    Barzilay, Regina
    Jaakkola, Tommi S.
    Green, William H.
    Jensen, Klays F.
    [J]. ACS CENTRAL SCIENCE, 2017, 3 (05) : 434 - 443
  • [8] Deep learning for molecular design-a review of the state of the art
    Elton, Daniel C.
    Boukouvalas, Zois
    Fuge, Mark D.
    Chung, Peter W.
    [J]. MOLECULAR SYSTEMS DESIGN & ENGINEERING, 2019, 4 (04) : 828 - 849
  • [9] CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules
    Feldman, HJ
    Dumontier, M
    Ling, S
    Haider, N
    Hogue, CWV
    [J]. FEBS LETTERS, 2005, 579 (21): : 4685 - 4691
  • [10] Using Machine Learning To Predict Suitable Conditions for Organic Reactions
    Gao, Hanyu
    Struble, Thomas J.
    Coley, Connor W.
    Wang, Yuran
    Green, William H.
    Jensen, Klavs F.
    [J]. ACS CENTRAL SCIENCE, 2018, 4 (11) : 1465 - 1476