Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引:1
|
作者
Zang, Xuan [1 ]
Zhang, Junjie [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;
D O I
10.1109/JBHI.2024.3422488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.
引用
收藏
页码:7659 / 7669
页数:11
相关论文
共 47 条
  • [41] W2V-BERT: COMBINING CONTRASTIVE LEARNING AND MASKED LANGUAGE MODELING FOR SELF-SUPERVISED SPEECH PRE-TRAINING
    Chung, Yu-An
    Zhang, Yu
    Han, Wei
    Chiu, Chung-Cheng
    Qin, James
    Pang, Ruoming
    Wu, Yonghui
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 244 - 250
  • [42] REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph
    Cui, Di
    Wang, Qiangqiang
    Wang, Siqi
    Chi, Jianlei
    Li, Jianan
    Wang, Lu
    Li, Qingshan
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 191 - 202
  • [43] MORN: Molecular Property Prediction Based on Textual-Topological-Spatial Multi-View Learning
    Ma, Runze
    Zhang, Yidan
    Wang, Xinye
    Yu, Zhenyang
    Duan, Lei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1461 - 1470
  • [44] Semi-supervised learning-based virtual adversarial training on graph for molecular property prediction
    Lu, Yong
    Wang, Chenxu
    Wang, Ze
    Zhang, Xukun
    Zhou, Guifei
    Li, Chunyan
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 115 : 491 - 500
  • [45] Boosting the performance of molecular property prediction via graph-text alignment and multi-granularity representation enhancement
    Zhao, Zhuoran
    Zhou, Qing
    Wu, Chengkai
    Su, Renbin
    Xiong, Weihong
    JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2024, 132
  • [46] CasANGCL: pre -training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction
    Zheng, Zixi
    Tan, Yanyan
    Wang, Hong
    Yu, Shengpeng
    Liu, Tianyu
    Liang, Cheng
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [47] Multi-scale cross-attention transformer via graph embeddings for few-shot molecular property prediction
    Torres, Luis H. M.
    Ribeiro, Bernardete
    Arrais, Joel P.
    APPLIED SOFT COMPUTING, 2024, 153