Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引:1
|
作者
Zang, Xuan [1 ]
Zhang, Junjie [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;
D O I
10.1109/JBHI.2024.3422488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.
引用
收藏
页码:7659 / 7669
页数:11
相关论文
共 47 条
  • [21] Self-Supervised Pre-Training with Bridge Neural Network for SAR-Optical Matching
    Qian, Lixin
    Liu, Xiaochun
    Huang, Meiyu
    Xiang, Xueshuang
    REMOTE SENSING, 2022, 14 (12)
  • [22] Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction
    Shen, Ao
    Yuan, Mingzhi
    Ma, Yingfan
    Du, Jie
    Wang, Manning
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [23] Self-Supervised Pre-Training for 3-D Roof Reconstruction on LiDAR Data
    Yang, Hongxin
    Huang, Shangfeng
    Wang, Ruisheng
    Wang, Xin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [24] Empower Post-hoc Graph Explanations with Information Bottleneck: A Pre-training and Fine-tuning Perspective
    Wang, Jihong
    Luo, Minnan
    Li, Jundong
    Lin, Yun
    Dong, Yushun
    Dong, Jin Song
    Zheng, Qinghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2349 - 2360
  • [25] CoSleep: A Multi-View Representation Learning Framework for Self-Supervised Learning of Sleep Stage Classification
    Ye, Jianan
    Xiao, Qinfeng
    Wang, Jing
    Zhang, Hongjun
    Deng, Jiaoxue
    Lin, Youfang
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 189 - 193
  • [26] Pre-training molecular representation model with spatial geometry for property prediction
    Li, Yishui
    Wang, Wei
    Liu, Jie
    Wu, Chengkun
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 109
  • [27] Automated 3D Pre-Training for Molecular Property Prediction
    Wang, Xu
    Zhao, Huan
    Tu, Wei-wei
    Yao, Quanming
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 2419 - 2430
  • [28] Triple Generative Self-Supervised Learning Method for Molecular Property Prediction
    Xu, Lei
    Xia, Leiming
    Pan, Shourun
    Li, Zhen
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (07)
  • [29] Multi-task self-supervised learning based fusion representation for Multi-view clustering
    Guo, Tianlong
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    INFORMATION SCIENCES, 2025, 694
  • [30] M2Mol: Multi-view Multi-granularity Molecular Representation Learning for Property Prediction
    Zhang, Ran
    Wang, Xuezhi
    Liu, Kunpeng
    Zhou, Yuanchun
    Wang, Pengfei
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT VII, DASFAA 2024, 2024, 14856 : 264 - 274