Prediction of chemical reaction yields with large-scale multi-view pre-training

被引:0
作者
Runhan Shi
Gufeng Yu
Xiaohong Huo
Yang Yang
机构
[1] Shanghai Jiao Tong University,Department of Computer Science and Engineering, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering
[2] Shanghai Jiao Tong University,Shanghai Key Laboratory for Molecular Engineering of Chiral Drugs, Frontiers Science Center for Transformative Molecules, School of Chemistry and Chemical Engineering
来源
Journal of Cheminformatics | / 16卷
关键词
Chemical reaction yield prediction; Self-supervised learning; Multi-view;
D O I
暂无
中图分类号
学科分类号
摘要
Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.
引用
收藏
相关论文
共 109 条
[1]  
Davies IW(2019)The digitization of organic synthesis Nature 570 175-181
[2]  
Meuwly M(2021)Machine learning for chemical reactions Chem Rev 121 10218-53
[3]  
Schwaller P(2021)Prediction of chemical reaction yields using deep learning Machine Learn Sci Technol 2 5505-287
[4]  
Vaucher AC(2020)Machine learning in chemical reaction space Nature Commun 11 39-190
[5]  
Laino T(2022)Machine intelligence for chemical reaction space Wiley Interdiscipl Rev Computat Mol Sci 12 284-1865
[6]  
Stocker S(2015)Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity J Chem Inform Model 55 186-1379
[7]  
Csányi G(2018)Machine learning approach for prediction of reaction yield with simulated catalyst parameters Chem Lett 47 1856-5005
[8]  
Reuter K(2018)Predicting reaction performance in C-N cross-coupling using machine learning Science 360 3231-36
[9]  
Schwaller P(2021)Predicting reaction yields via supervised learning Accounts Chem Res 54 1368-152
[10]  
Vaucher AC(2021)Prediction of multicomponent reaction yields using machine learning Chin J Chem 39 4997-452