Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning

被引:57
作者
Fortunato, Michael E. [1 ]
Coley, Connor W. [1 ]
Barnes, Brian C. [2 ]
Jensen, Klavs F. [1 ]
机构
[1] MIT, Dept Chem Engn, Cambridge, MA 02139 USA
[2] CCDC Army Res Lab, Detonat Sci & Modeling Branch, Aberdeen Proving Ground, MD 21005 USA
关键词
NEURAL-NETWORKS;
D O I
10.1021/acs.jcim.0c00403
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
This work presents efforts to augment the performance of data-driven machine learning algorithms for reaction template recommendation used in computer-aided synthesis planning software. Often, machine learning models designed to perform the task of prioritizing reaction templates or molecular transformations are focused on reporting high-accuracy metrics for the one-to-one mapping of product molecules in reaction databases to the template extracted from the recorded reaction. The available templates that get selected for inclusion in these machine learning models have been previously limited to those that appear frequently in the reaction databases and exclude potentially useful transformations. By augmenting open-access data sets of organic reactions with explicitly calculated template applicability and pretraining a template-relevance neural network on this augmented applicability data set, we report an increase in the template applicability recall and an increase in the diversity of predicted precursors. The augmentation and pretraining effectively teaches the neural network an increased set of templates that could theoretically lead to successful reactions for a given target. Even on a small data set of well-curated reactions, the data augmentation and pretraining methods resulted in an increase in top-1 accuracy, especially for rare templates, indicating that these strategies can be very useful for small data sets.
引用
收藏
页码:3398 / 3407
页数:10
相关论文
共 36 条
[1]  
[Anonymous], WHO MOD LIST ESS MED
[2]  
[Anonymous], 2015, Tech. Rep.
[3]   Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning [J].
Badowski, Tomasz ;
Gajewska, Ewa P. ;
Molga, Karol ;
Grzybowski, Bartosz A. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2020, 59 (02) :725-730
[4]  
Bjerrum E. J., ARXIV170307076
[5]  
Brevdo E., 2016, TENSOR
[6]  
Chen B., ARXIV190512712
[7]   A robotic platform for flow synthesis of organic compounds informed by AI planning [J].
Coley, Connor W. ;
Thomas, Dale A., III ;
Lummiss, Justin A. M. ;
Jaworski, Jonathan N. ;
Breen, Christopher P. ;
Schultz, Victor ;
Hart, Travis ;
Fishman, Joshua S. ;
Rogers, Luke ;
Gao, Hanyu ;
Hicklin, Robert W. ;
Plehiers, Pieter P. ;
Byington, Joshua ;
Piotti, John S. ;
Green, William H. ;
Hart, A. John ;
Jamison, Timothy F. ;
Jensen, Klavs F. .
SCIENCE, 2019, 365 (6453) :557-+
[8]   RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application [J].
Coley, Connor W. ;
Green, William H. ;
Jensen, Klays F. .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (06) :2529-2537
[9]   Computer-Assisted Retrosynthesis Based on Molecular Similarity [J].
Coley, Connor W. ;
Rogers, Luke ;
Green, William H. ;
Jensen, Klavs F. .
ACS CENTRAL SCIENCE, 2017, 3 (12) :1237-1245
[10]   Data Augmentation for Deep Neural Network Acoustic Modeling [J].
Cui, Xiaodong ;
Goel, Vaibhava ;
Kingsbury, Brian .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) :1469-1477