Prediction of chemical reaction yields with large-scale multi-view pre-training

被引：4

作者：

Shi, Runhan ^{[1
,2
]}

Yu, Gufeng ^{[1
,2
]}

Huo, Xiaohong ^{[3
]}

Yang, Yang ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interact, Shanghai 200240, Peoples R China

[3] Shanghai Jiao Tong Univ, Shanghai Key Lab Mol Engn & Chiral Drugs, Frontiers Sci Ctr Transformat Mol, Sch Chem & Chem Engn, Shanghai 200240, Peoples R China

来源：

JOURNAL OF CHEMINFORMATICS | 2024年 / 16卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Chemical reaction yield prediction; Self-supervised learning; Multi-view; INFORMATION; LANGUAGE; SMILES; MODEL;

D O I：

10.1186/s13321-024-00815-2

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.

引用

页数：16

共 25 条

[21] Intermittent Deployment for Large-Scale Multi-Robot Forage Perception: Data Synthesis, Prediction, and Planning
Liu, Jun
Rangwala, Murtaza
Ahluwalia, Kulbir Singh
Ghajar, Shayan
Dhami, Harnaik
Tokekar, Pratap
Tracy, Benjamin
Williams, Ryan K.
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 27 - 47
[22] Reduction of large-scale chemical mechanisms using global sensitivity analysis on reaction class/sub-mechanism
Chang, Yachao
Jia, Ming
Niu, Bo
Dong, Xue
Wang, Pengzhi
COMBUSTION AND FLAME, 2020, 212 : 355 - 366
[23] A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China
Xu, Lei
Chen, Nengcheng
Zhang, Xiang
JOURNAL OF HYDROLOGY, 2018, 557 : 378 - 390
[24] Large-scale crop dataset and deep learning-based multi-modal fusion framework for more accurate GxE genomic prediction
Zou, Qixiang
Tai, Shuaishuai
Yuan, Qianguang
Nie, Yating
Gou, Heping
Wang, Longfei
Li, Chuanxiu
Jing, Yi
Dong, Fangchun
Yue, Zhen
Rong, Yi
Fang, Xiaodong
Xiong, Shengwu
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 230
[25] Parallel and large-scale antitumor investigation using stable chemical gradient and heterotypic three-dimensional tumor coculture in a multi-layered microfluidic device
Liu, Wenming
Hu, Rui
Han, Kai
Sun, Meilin
Liu, Dan
Zhang, Jinwei
Wang, Jinyi
BIOTECHNOLOGY JOURNAL, 2021, 16 (10)

← 1 2 3 →