Multi-Network Graph Contrastive Learning for Cancer Driver Gene Identification

被引:11
作者
Peng, Wei [1 ,2 ]
Zhou, Zhengnan [1 ,2 ]
Dai, Wei [1 ,2 ]
Yu, Ning [3 ]
Wang, Jianxin [4 ,5 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650093, Peoples R China
[2] Kunming Univ Sci & Technol, Comp Technol Applicat Key Lab Yunnan Prov, Kunming 650093, Peoples R China
[3] State Univ New York, Dept Comp Sci, Coll Brockport, Brockport, NY 14422 USA
[4] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[5] Cent South Univ, Hunan Prov Key Lab Bioinformat, Changsha 410083, Peoples R China
来源
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2024年 / 11卷 / 04期
基金
中国国家自然科学基金;
关键词
Cancer; Self-supervised learning; Semantics; Bioinformatics; Proteins; Data augmentation; Predictive models; Cancer driver genes; multi-view gene network; graph contrastive learning; network integration;
D O I
10.1109/TNSE.2024.3373652
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Identifying driver genes contributing to the occurrence and development of cancers plays a critical role in cancer research and treatment. Some recent computational approaches identify cancer-driver genes based on gene networks, assuming that cancer-driver genes perform essential functions in gene networks. Due to the noise in gene function networks, many works focus on integrating gene networks derived from multi-omics datasets to improve the accuracy of cancer driver gene detection. However, most of them ignore the information interactions between these multi-omics datasets. In this work, we propose MNGCL, a Multi-Network Graph Contrastive Learning method to identify cancer driver genes. It first constructs three gene networks as different views based on protein interactions, gene semantic similarities, and gene co-occurrence in signaling pathways. Then, we perform data augmentation of these gene networks and input them into a graph contrastive learning (GCL) encoder with shared parameters to learn consistent gene feature representation in different networks from a holistic perspective. After that, the gene features from the GCL encoder are passed through three different graph convolutional networks to generate the unique gene feature representations in the three networks. Finally, we used a logistic regression model to fuse the gene feature representations generated in each network to predict cancer driver genes. The experimental results show that MNGCL improves the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) to a greater extent than the existing methods in identifying driver genes for both pan-cancer and single-type cancers. Furthermore, the ablation studies show that our model capturing dependencies and interactions between gene networks provided a more comprehensive perspective on the molecular mechanisms underlying cancer and improved the accuracy of cancer driver identification.
引用
收藏
页码:3430 / 3440
页数:11
相关论文
共 29 条
[1]   Machine learning methods for prediction of cancer driver genes: a survey paper [J].
Andrades, Renan ;
Recamonde-Mendoza, Mariana .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
[2]   MUFFINN: cancer gene discovery via network analysis of somatic mutation data [J].
Cho, Ara ;
Shim, Jung Eun ;
Kim, Eiru ;
Supek, Fran ;
Lehner, Ben ;
Lee, Insuk .
GENOME BIOLOGY, 2016, 17
[3]  
Defferrard M, 2016, ADV NEUR IN, V29
[4]   Prediction of cancer driver genes through network-based moment propagation of mutation scores [J].
Gumpinger, Anja C. ;
Lage, Kasper ;
Horn, Heiko ;
Borgwardt, Karsten .
BIOINFORMATICS, 2020, 36 :508-515
[5]   DawnRank: discovering personalized driver genes in cancer [J].
Hou, Jack P. ;
Ma, Jian .
GENOME MEDICINE, 2014, 6
[6]   Adjusting batch effects in microarray expression data using empirical Bayes methods [J].
Johnson, W. Evan ;
Li, Cheng ;
Rabinovic, Ariel .
BIOSTATISTICS, 2007, 8 (01) :118-127
[7]   ConsensusPathDB: toward a more complete picture of cell biology [J].
Kamburov, Atanas ;
Pentchev, Konstantin ;
Galicka, Hanna ;
Wierling, Christoph ;
Lehrach, Hans ;
Herwig, Ralf .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D712-D717
[8]   DigSee: disease gene search engine with evidence sentences (version cancer) [J].
Kim, Jeongkyun ;
So, Seongeun ;
Lee, Hee-Jin ;
Park, Jong C. ;
Kim, Jung-jae ;
Lee, Hyunju .
NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) :W510-W517
[9]  
Kipf ThomasN., 2016, INT C LEARN REPR, DOI DOI 10.48550/ARXIV.1609.02907
[10]  
Liu YB, 2023, PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, P2215