Machine Learning May Sometimes Simply Capture LiteraturePopularity Trends: A Case Study of Heterocyclic Suzuki-MiyauraCoupling

被引:109
作者
Beker, Wiktor [1 ,5 ]
Roszak, Rafal [1 ,5 ]
Wolos, Agnieszka [1 ,5 ]
Angello, Nicholas H. [2 ]
Rathore, Vandana [2 ]
Burke, Martin D. [2 ,3 ,4 ]
Grzybowski, Bartosz A. [1 ,5 ,6 ,7 ]
机构
[1] Allchemy Inc, Highland, IN 46322 USA
[2] Univ Illinois, Dept Chem, Urbana, IL 61801 USA
[3] Univ Illinois, Carle Illinois Coll Med, Inst Genom Biol, Dept Biochem, Urbana, IL 61801 USA
[4] Univ Illinois, Beckman Inst, Urbana, IL 61801 USA
[5] Polish Acad Sci, Inst Organ Chem, PL-01224 Warsaw, Poland
[6] Inst Basic Sci IBS, Ctr Soft & Living Matter, Ulsan 44919, South Korea
[7] Ulsan Inst Sci & Technol UNIST, Dept Chem, Ulsan 44919, South Korea
关键词
CROSS-COUPLING REACTIONS; PREDICTION; CLASSIFICATION; ALGORITHM; COMPUTER; ALLOWS; GO;
D O I
10.1021/jacs.1c12005
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Applications of machine learning (ML) to synthetic chemistry rely on the assumption that large numbers ofliterature-reported examples should enable construction of accurate and predictive models of chemical reactivity. This paperdemonstrates that abundance of carefully curated literature data may be insufficient for this purpose. Using an example of Suzuki-Miyaura coupling with heterocyclic building blocks & xe0d5;and a carefully selected database of >10,000 literature examples & xe0d5;we show thatML models cannot offer any meaningful predictions of optimum reaction conditions, even if the search space is restricted to onlysolvents and bases. This result holds irrespective of the ML model applied (from simple feed-forward to state-of-the-art graph-convolution neural networks) or the representation to describe the reaction partners (variousfingerprints, chemical descriptors,latent representations, etc.). In all cases, the ML methods fail to perform significantly better than naive assignments based on thesheer frequency of certain reaction conditions reported in the literature. These unsatisfactory results likely reflect subjectivepreferences of various chemists to use certain protocols, other biasing factors as mundane as availability of certain solvents/reagents,and/or a lack of negative data. Thesefindings highlight the likely importance of systematically generating reliable and standardizeddata sets for algorithm training.
引用
收藏
页码:4819 / 4827
页数:9
相关论文
共 46 条
[1]   Predicting reaction performance in C-N cross-coupling using machine learning [J].
Ahneman, Derek T. ;
Estrada, Jesus G. ;
Lin, Shishi ;
Dreher, Spencer D. ;
Doyle, Abigail G. .
SCIENCE, 2018, 360 (6385) :186-190
[2]   Best practices in machine learning for chemistry comment [J].
Artrith, Nongnuch ;
Butler, Keith T. ;
Coudert, Francois-Xavier ;
Han, Seungwu ;
Isayev, Olexandr ;
Jain, Anubhav ;
Walsh, Aron .
NATURE CHEMISTRY, 2021, 13 (06) :505-508
[3]   Synergy Between Expert and Machine-Learning Approaches Allows for Improved Retrosynthetic Planning [J].
Badowski, Tomasz ;
Gajewska, Ewa P. ;
Molga, Karol ;
Grzybowski, Bartosz A. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2020, 59 (02) :725-730
[4]  
Bae S. Y., COMPUT TOXICOL, V20
[5]   Prediction of Major Regio-, Site-, and Diastereoisomers in Diels-Alder Reactions by Using Machine-Learning: The Importance of Physically Meaningful Descriptors [J].
Beker, Wiktor ;
Gajewska, Ewa P. ;
Badowski, Tomasz ;
Grzybowski, Bartosz A. .
ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2019, 58 (14) :4515-4519
[6]   Application of Palladium Based Precatalytic Systems in the Suzuki-Miyaura Cross-Coupling Reactions of Chloro- Heterocycles [J].
Bhaskaran, Savitha ;
Padusha, M. Syed Ali ;
Sajith, Ayyiliath M. .
CHEMISTRYSELECT, 2020, 5 (29) :9005-9016
[7]  
Borrelli W., 2021, CHEMRXIV3NQV9
[8]   Comment on "Predicting reaction performance in C-N cross-coupling using machine learning" [J].
Chuang, Kangway V. ;
Keiser, Michael J. .
SCIENCE, 2018, 362 (6416)
[9]   A robotic platform for flow synthesis of organic compounds informed by AI planning [J].
Coley, Connor W. ;
Thomas, Dale A., III ;
Lummiss, Justin A. M. ;
Jaworski, Jonathan N. ;
Breen, Christopher P. ;
Schultz, Victor ;
Hart, Travis ;
Fishman, Joshua S. ;
Rogers, Luke ;
Gao, Hanyu ;
Hicklin, Robert W. ;
Plehiers, Pieter P. ;
Byington, Joshua ;
Piotti, John S. ;
Green, William H. ;
Hart, A. John ;
Jamison, Timothy F. ;
Jensen, Klavs F. .
SCIENCE, 2019, 365 (6453) :557-+
[10]   A graph-convolutional neural network model for the prediction of chemical reactivity [J].
Coley, Connor W. ;
Jin, Wengong ;
Rogers, Luke ;
Jamison, Timothy F. ;
Jaakkola, Tommi S. ;
Green, William H. ;
Barzilay, Regina ;
Jensen, Klavs F. .
CHEMICAL SCIENCE, 2019, 10 (02) :370-377