Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引：1

作者：

Zang, Xuan ^{[1
]}

Zhang, Junjie ^{[1
]}

Tang, Buzhou ^{[2
,3
]}

机构：

[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2024年 / 28卷 / 12期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;

D O I：

10.1109/JBHI.2024.3422488

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.

引用

页码：7659 / 7669

页数：11

共 47 条

[21] Self-Supervised Pre-Training with Bridge Neural Network for SAR-Optical Matching
Qian, Lixin
Liu, Xiaochun
Huang, Meiyu
Xiang, Xueshuang
REMOTE SENSING, 2022, 14 (12)
[22] Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction
Shen, Ao
Yuan, Mingzhi
Ma, Yingfan
Du, Jie
Wang, Manning
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
[23] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
Yang, Hongxin
Huang, Shangfeng
Wang, Ruisheng
Wang, Xin
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[24] Empower Post-hoc Graph Explanations with Information Bottleneck: A Pre-training and Fine-tuning Perspective
Wang, Jihong
Luo, Minnan
Li, Jundong
Lin, Yun
Dong, Yushun
Dong, Jin Song
Zheng, Qinghua
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2349 - 2360
[25] CoSleep: A Multi-View Representation Learning Framework for Self-Supervised Learning of Sleep Stage Classification
Ye, Jianan
Xiao, Qinfeng
Wang, Jing
Zhang, Hongjun
Deng, Jiaoxue
Lin, Youfang
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 189 - 193
[26] Pre-training molecular representation model with spatial geometry for property prediction
Li, Yishui
Wang, Wei
Liu, Jie
Wu, Chengkun
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 109
[27] Automated 3D Pre-Training for Molecular Property Prediction
Wang, Xu
Zhao, Huan
Tu, Wei-wei
Yao, Quanming
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2419 - 2430
[28] Triple Generative Self-Supervised Learning Method for Molecular Property Prediction
Xu, Lei
Xia, Leiming
Pan, Shourun
Li, Zhen
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (07)
[29] Multi-task self-supervised learning based fusion representation for Multi-view clustering
Guo, Tianlong
Shen, Derong
Kou, Yue
Nie, Tiezheng
INFORMATION SCIENCES, 2025, 694
[30] M2Mol: Multi-view Multi-granularity Molecular Representation Learning for Property Prediction
Zhang, Ran
Wang, Xuezhi
Liu, Kunpeng
Zhou, Yuanchun
Wang, Pengfei
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT VII, DASFAA 2024, 2024, 14856 : 264 - 274

← 1 2 3 4 5 →