Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引：1

作者：

Zang, Xuan ^{[1
]}

Zhang, Junjie ^{[1
]}

Tang, Buzhou ^{[2
,3
]}

机构：

[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China

[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS | 2024年 / 28卷 / 12期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;

D O I：

10.1109/JBHI.2024.3422488

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.

引用

页码：7659 / 7669

页数：11

共 47 条

[31] Self-Supervised Pre-Training for Intravascular Ultrasound Image Segmentation Method Based on Diffusion Model
Hao Wenyue
Cai Huaiyu
Zuo Tingtao
Jia Zhongwei
Wang Yi
Chen Xiaodong
LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (18)
[32] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Chen, Sanyuan
Wang, Chengyi
Chen, Zhengyang
Wu, Yu
Liu, Shujie
Chen, Zhuo
Li, Jinyu
Kanda, Naoyuki
Yoshioka, Takuya
Xiao, Xiong
Wu, Jian
Zhou, Long
Ren, Shuo
Qian, Yanmin
Qian, Yao
Zeng, Michael
Yu, Xiangzhan
Wei, Furu
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
[33] MvMRL: a multi-view molecular representation learning method for molecular property prediction
Zhang, Ru
Lin, Yanmei
Wu, Yijia
Deng, Lei
Zhang, Hao
Liao, Mingzhi
Peng, Yuzhong
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
[34] SC2-Net: Self-supervised learning for multi-view complementarity representation and consistency fusion network
Huang, Liting
Fan, Xiangyang
Xia, Tianlin
Li, Yuhang
Ding, Youdong
NEUROCOMPUTING, 2023, 556
[35] Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
Schreyer, Marco
Sattarov, Timur
Borth, Damian
ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
[36] Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction
Xie, Ailin
Zhang, Ziqiao
Guan, Jihong
Zhou, Shuigeng
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
[37] A Chemical Domain Knowledge-Aware Framework for Multi-view Molecular Property Prediction
Hua, Rui
Wang, Xinyan
Cheng, Chuang
Zhu, Qiang
Zhou, Xuezhong
CCKS 2022 - EVALUATION TRACK, 2022, 1711 : 1 - 11
[38] An Effective Representation Learning Approach: The Integrated Self-Supervised Pre-Training Models of StyleGAN2-ADA and DINO for Colon Polyp Images
Kim, Jong-Yeup
Tangriberganov, Gayrat
Jung, Woochul
Kim, Dae Sung
Koo, Hoon Sup
Lee, Suehyun
Kim, Sun Moon
IEEE ACCESS, 2023, 11 : 143628 - 143634
[39] Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification
Wan, Lin
Jing, Qianyan
Sun, Zongyuan
Zhang, Chuang
Li, Zhihang
Chen, Yehansen
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 3044 - 3057
[40] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
Cheng, Haozhe
Han, Xu
Shi, Pengcheng
Zhu, Jihua
Li, Zhongyu
KNOWLEDGE-BASED SYSTEMS, 2024, 283

← 1 2 3 4 5 →