Prediction of chemical reaction yields with large-scale multi-view pre-training

被引:4
|
作者
Shi, Runhan [1 ,2 ]
Yu, Gufeng [1 ,2 ]
Huo, Xiaohong [3 ]
Yang, Yang [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Key Lab Shanghai Educ Commiss Intelligent Interact, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai Key Lab Mol Engn & Chiral Drugs, Frontiers Sci Ctr Transformat Mol, Sch Chem & Chem Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Chemical reaction yield prediction; Self-supervised learning; Multi-view; INFORMATION; LANGUAGE; SMILES; MODEL;
D O I
10.1186/s13321-024-00815-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.
引用
收藏
页数:16
相关论文
共 25 条
  • [21] Intermittent Deployment for Large-Scale Multi-Robot Forage Perception: Data Synthesis, Prediction, and Planning
    Liu, Jun
    Rangwala, Murtaza
    Ahluwalia, Kulbir Singh
    Ghajar, Shayan
    Dhami, Harnaik
    Tokekar, Pratap
    Tracy, Benjamin
    Williams, Ryan K.
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (01) : 27 - 47
  • [22] Reduction of large-scale chemical mechanisms using global sensitivity analysis on reaction class/sub-mechanism
    Chang, Yachao
    Jia, Ming
    Niu, Bo
    Dong, Xue
    Wang, Pengzhi
    COMBUSTION AND FLAME, 2020, 212 : 355 - 366
  • [23] A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China
    Xu, Lei
    Chen, Nengcheng
    Zhang, Xiang
    JOURNAL OF HYDROLOGY, 2018, 557 : 378 - 390
  • [24] Large-scale crop dataset and deep learning-based multi-modal fusion framework for more accurate GxE genomic prediction
    Zou, Qixiang
    Tai, Shuaishuai
    Yuan, Qianguang
    Nie, Yating
    Gou, Heping
    Wang, Longfei
    Li, Chuanxiu
    Jing, Yi
    Dong, Fangchun
    Yue, Zhen
    Rong, Yi
    Fang, Xiaodong
    Xiong, Shengwu
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 230
  • [25] Parallel and large-scale antitumor investigation using stable chemical gradient and heterotypic three-dimensional tumor coculture in a multi-layered microfluidic device
    Liu, Wenming
    Hu, Rui
    Han, Kai
    Sun, Meilin
    Liu, Dan
    Zhang, Jinwei
    Wang, Jinyi
    BIOTECHNOLOGY JOURNAL, 2021, 16 (10)