Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引:1
|
作者
Zang, Xuan [1 ]
Zhang, Junjie [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;
D O I
10.1109/JBHI.2024.3422488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.
引用
收藏
页码:7659 / 7669
页数:11
相关论文
共 47 条
  • [31] Self-Supervised Pre-Training for Intravascular Ultrasound Image Segmentation Method Based on Diffusion Model
    Hao Wenyue
    Cai Huaiyu
    Zuo Tingtao
    Jia Zhongwei
    Wang Yi
    Chen Xiaodong
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (18)
  • [32] WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
    Chen, Sanyuan
    Wang, Chengyi
    Chen, Zhengyang
    Wu, Yu
    Liu, Shujie
    Chen, Zhuo
    Li, Jinyu
    Kanda, Naoyuki
    Yoshioka, Takuya
    Xiao, Xiong
    Wu, Jian
    Zhou, Long
    Ren, Shuo
    Qian, Yanmin
    Qian, Yao
    Zeng, Michael
    Yu, Xiangzhan
    Wei, Furu
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1505 - 1518
  • [33] MvMRL: a multi-view molecular representation learning method for molecular property prediction
    Zhang, Ru
    Lin, Yanmei
    Wu, Yijia
    Deng, Lei
    Zhang, Hao
    Liao, Mingzhi
    Peng, Yuzhong
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [34] SC2-Net: Self-supervised learning for multi-view complementarity representation and consistency fusion network
    Huang, Liting
    Fan, Xiangyang
    Xia, Tianlin
    Li, Yuhang
    Ding, Youdong
    NEUROCOMPUTING, 2023, 556
  • [35] Multi-view Contrastive Self-Supervised Learning of Accounting Data Representations for Downstream Audit Tasks
    Schreyer, Marco
    Sattarov, Timur
    Borth, Damian
    ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
  • [36] Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction
    Xie, Ailin
    Zhang, Ziqiao
    Guan, Jihong
    Zhou, Shuigeng
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
  • [37] A Chemical Domain Knowledge-Aware Framework for Multi-view Molecular Property Prediction
    Hua, Rui
    Wang, Xinyan
    Cheng, Chuang
    Zhu, Qiang
    Zhou, Xuezhong
    CCKS 2022 - EVALUATION TRACK, 2022, 1711 : 1 - 11
  • [38] An Effective Representation Learning Approach: The Integrated Self-Supervised Pre-Training Models of StyleGAN2-ADA and DINO for Colon Polyp Images
    Kim, Jong-Yeup
    Tangriberganov, Gayrat
    Jung, Woochul
    Kim, Dae Sung
    Koo, Hoon Sup
    Lee, Suehyun
    Kim, Sun Moon
    IEEE ACCESS, 2023, 11 : 143628 - 143634
  • [39] Self-Supervised Modality-Aware Multiple Granularity Pre-Training for RGB-Infrared Person Re-Identification
    Wan, Lin
    Jing, Qianyan
    Sun, Zongyuan
    Zhang, Chuang
    Li, Zhihang
    Chen, Yehansen
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 3044 - 3057
  • [40] Multi-Trusted Cross-Modal Information Bottleneck for 3D self-supervised representation learning
    Cheng, Haozhe
    Han, Xu
    Shi, Pengcheng
    Zhu, Jihua
    Li, Zhongyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 283