Molecular pretraining models towards molecular property prediction

被引:0
作者
Qiao, Jianbo [1 ]
Gao, Wenjia [1 ]
Jin, Junru [1 ]
Wang, Ding [1 ]
Guo, Xu [1 ]
Manavalan, Balachandran [2 ]
Wei, Leyi [3 ,4 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
[2] Sungkyunkwan Univ, Coll Biotechnol & Bioengn, Dept Integrat Biotechnol, Suwon 16419, South Korea
[3] Macao Polytech Univ, Fac Appl Sci, Macau 999078, Peoples R China
[4] Shandong Univ, Joint SDU NTU Ctr Artificial Intelligence Res C FA, Jinan 250101, Peoples R China
基金
新加坡国家研究基金会; 中国国家自然科学基金;
关键词
molecular pretraining models; molecular property prediction; graph neural network (GNN); graph Transformer; PubChem; MoleculeNet; DATABASE;
D O I
10.1007/s11432-024-4457-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular property prediction plays a pivotal role in advancing our understanding of molecular representations, serving as a key driver for progress in drug discovery. Leveraging deep learning to gain comprehensive insights into molecular properties has become increasingly critical. Recent breakthroughs in molecular property prediction have been achieved through molecular pretraining models, which utilize large-scale databases of unlabeled molecules for pretraining, followed by fine-tuning for specific downstream tasks. These models enable a deeper understanding of molecular properties. In this study, we review recent advancements in molecular property prediction using molecular pretraining models. Our focus includes molecular descriptors, the impact of pretraining dataset size, molecular characterization model architectures, and the diversity of pretraining task types. Additionally, we compare the performance of existing methods and propose future directions to enhance the effectiveness of molecular pretraining models.
引用
收藏
页数:19
相关论文
共 95 条
[1]   Inducible FGFR-1 activation leads to irreversible prostate adenocarcinoma and an epithelial-to-mesenchymal transition [J].
Acevedo, Victor D. ;
Gangula, Rama D. ;
Freeman, Kevin W. ;
Li, Rile ;
Zhang, Youngyou ;
Wang, Fen ;
Ayala, Gustavo E. ;
Peterson, Leif E. ;
Ittmann, Michael ;
Spencer, David M. .
CANCER CELL, 2007, 12 (06) :559-571
[2]  
Ahmad W, 2022, Arxiv, DOI [arXiv:2209.01712, DOI 10.48550/ARXIV.2209.01712]
[3]   MTMol-GPT: De novo multi-target molecular generation with transformer-based generative adversarial imitation learning [J].
Ai, Chengwei ;
Yang, Hongpeng ;
Liu, Xiaoyi ;
Dong, Ruihan ;
Ding, Yijie ;
Guo, Fei .
PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (06)
[4]   Low Rank Matrix Factorization Algorithm Based on Multi-Graph Regularization for Detecting Drug-Disease Association [J].
Ai, Chengwei ;
Yang, Hongpeng ;
Ding, Yijie ;
Tang, Jijun ;
Guo, Fei .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) :3033-3043
[5]   GEOM, energy-annotated molecular conformations for property prediction and molecular generation [J].
Axelrod, Simon ;
Gomez-Bombarelli, Rafael .
SCIENTIFIC DATA, 2022, 9 (01)
[6]   970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13 [J].
Blum, Lorenz C. ;
Reymond, Jean-Louis .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2009, 131 (25) :8732-+
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature [J].
Chai, T. ;
Draxler, R. R. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2014, 7 (03) :1247-1250
[9]   Potent antibiotic design via guided search from antibacterial activity evaluations [J].
Chen, Lu ;
Yu, Liang ;
Gao, Lin .
BIOINFORMATICS, 2023, 39 (02)
[10]   Pretraining graph transformer for molecular representation with fusion of multimodal information [J].
Chen, Ruizhe ;
Li, Chunyan ;
Wang, Longyue ;
Liu, Mingquan ;
Chen, Shugao ;
Yang, Jiahao ;
Zeng, Xiangxiang .
INFORMATION FUSION, 2025, 115