Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引：1

作者：

Zang, Xuan ^{[1
]}

Zhang, Junjie ^{[1
]}

Tang, Buzhou ^{[2
,3
]}

机构：

[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2024年 / 28卷 / 12期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;

D O I：

10.1109/JBHI.2024.3422488

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.

引用

页码：7659 / 7669

页数：11

共 47 条

[41] W2V-BERT: COMBINING CONTRASTIVE LEARNING AND MASKED LANGUAGE MODELING FOR SELF-SUPERVISED SPEECH PRE-TRAINING
Chung, Yu-An
Zhang, Yu
Han, Wei
Chiu, Chung-Cheng
Qin, James
Pang, Ruoming
Wu, Yonghui
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 244 - 250
[42] REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph
Cui, Di
Wang, Qiangqiang
Wang, Siqi
Chi, Jianlei
Li, Jianan
Wang, Lu
Li, Qingshan
2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 191 - 202
[43] MORN: Molecular Property Prediction Based on Textual-Topological-Spatial Multi-View Learning
Ma, Runze
Zhang, Yidan
Wang, Xinye
Yu, Zhenyang
Duan, Lei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1461 - 1470
[44] Semi-supervised learning-based virtual adversarial training on graph for molecular property prediction
Lu, Yong
Wang, Chenxu
Wang, Ze
Zhang, Xukun
Zhou, Guifei
Li, Chunyan
ALEXANDRIA ENGINEERING JOURNAL, 2025, 115 : 491 - 500
[45] Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancement
Zhao, Zhuoran
Zhou, Qing
Wu, Chengkai
Su, Renbin
Xiong, Weihong
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2024, 132
[46] CasANGCL: pre -training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction
Zheng, Zixi
Tan, Yanyan
Wang, Hong
Yu, Shengpeng
Liu, Tianyu
Liang, Cheng
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
[47] Multi-scale cross-attention transformer via graph embeddings for few-shot molecular property prediction
Torres, Luis H. M.
Ribeiro, Bernardete
Arrais, Joel P.
APPLIED SOFT COMPUTING, 2024, 153

← 1 2 3 4 5 →