SMG: self-supervised masked graph learning for cancer gene identification

被引:9
作者
Cui, Yan [1 ]
Wang, Zhikang [2 ,3 ]
Wang, Xiaoyu [2 ,3 ]
Zhang, Yiwen [4 ,5 ]
Zhang, Ying [6 ]
Pan, Tong [2 ,3 ]
Zhang, Zhe [7 ]
Li, Shanshan [8 ]
Guo, Yuming [8 ]
Akutsu, Tatsuya [1 ]
Song, Jiangning [3 ,9 ]
机构
[1] Kyoto Univ, Bioinformat Ctr, Inst Chem Res, Kyoto, Japan
[2] Monash Univ, Dept Biochem & Mol Biol, Clayton, Vic, Australia
[3] Monash Univ, Biomed Discovery Inst, Clayton, Vic, Australia
[4] Monash Univ, Sch Publ Hlth & Prevent Med, Clayton, Vic, Australia
[5] Climate Air Qual Res Grp, Clayton, Vic, Australia
[6] Nanjing Univ Sci & Technol, Sch Comp Sci Engn, Nanjing, Peoples R China
[7] UniDTCo Ltd, Seoul, South Korea
[8] Monah Climate Air Qual Res CARE Unit, Global Environm Hlth & Biostat, Clayton, Vic, Australia
[9] Monash Univ, Monash Data Futures Inst, Clayton, Vic, Australia
基金
澳大利亚研究理事会; 英国医学研究理事会;
关键词
cancer genes; self-supervised learning; graph learning; representation learning; protein-protein interaction network; PI3K/AKT SIGNALING PATHWAY; BREAST-CANCER; SOMATIC MUTATIONS; DISCOVERY; MEDICINE; DATABASE; UPDATE;
D O I
10.1093/bib/bbad406
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.
引用
收藏
页数:13
相关论文
共 67 条
[1]   Signatures of mutational processes in human cancer [J].
Alexandrov, Ludmil B. ;
Nik-Zainal, Serena ;
Wedge, David C. ;
Aparicio, Samuel A. J. R. ;
Behjati, Sam ;
Biankin, Andrew V. ;
Bignell, Graham R. ;
Bolli, Niccolo ;
Borg, Ake ;
Borresen-Dale, Anne-Lise ;
Boyault, Sandrine ;
Burkhardt, Birgit ;
Butler, Adam P. ;
Caldas, Carlos ;
Davies, Helen R. ;
Desmedt, Christine ;
Eils, Roland ;
Eyfjord, Jorunn Erla ;
Foekens, John A. ;
Greaves, Mel ;
Hosoda, Fumie ;
Hutter, Barbara ;
Ilicic, Tomislav ;
Imbeaud, Sandrine ;
Imielinsk, Marcin ;
Jaeger, Natalie ;
Jones, David T. W. ;
Jones, David ;
Knappskog, Stian ;
Kool, Marcel ;
Lakhani, Sunil R. ;
Lopez-Otin, Carlos ;
Martin, Sancha ;
Munshi, Nikhil C. ;
Nakamura, Hiromi ;
Northcott, Paul A. ;
Pajic, Marina ;
Papaemmanuil, Elli ;
Paradiso, Angelo ;
Pearson, John V. ;
Puente, Xose S. ;
Raine, Keiran ;
Ramakrishna, Manasa ;
Richardson, Andrea L. ;
Richter, Julia ;
Rosenstiel, Philip ;
Schlesner, Matthias ;
Schumacher, Ton N. ;
Span, Paul N. ;
Teague, Jon W. .
NATURE, 2013, 500 (7463) :415-+
[2]   Machine learning methods for prediction of cancer driver genes: a survey paper [J].
Andrades, Renan ;
Recamonde-Mendoza, Mariana .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
[3]   AACR Project GENIE: Powering Precision Medicine through an International Consortium [J].
Andre, Fabrice ;
Arnedos, Monica ;
Baras, Alexander S. ;
Baselga, Jose ;
Bedard, Philippe L. ;
Berger, Michael F. ;
Bierkens, Mariska ;
Calvo, Fabien ;
Cerami, Ethan ;
Chakravarty, Debyani ;
Dang, Kristen K. ;
Davidson, Nancy E. ;
Del Vecchio, Fitz Catherine ;
Dogan, Semih ;
DuBois, Raymond N. ;
Ducar, Matthew D. ;
Futreal, P. Andrew ;
Gao Jianjiong ;
Garcia, Francisco ;
Gardos, Stu ;
Gocke, Christopher D. ;
Gross, Benjamin E. ;
Guinney, Justin ;
Heins, Zachary J. ;
Hintzen, Stephanie ;
Horlings, Hugo ;
Hudecek, Jan ;
Hyman, David M. ;
Kamel-Reid, Suzanne ;
Kandoth, Cyriac ;
Kinyua, Walter ;
Kumari, Priti ;
Kundra, Ritika ;
Ladanyi, Marc ;
Lefebvre, Celine ;
LeNoue-Newton, Michele L. ;
Lepisto, Eva M. ;
Levy, Mia A. ;
Lindeman, Neal, I ;
Lindsay, James ;
Liu, David ;
Lu Zhibin ;
MacConaill, Laura E. ;
Ian, Maurer ;
Maxwell, David S. ;
Meijer, Gerrit A. ;
Meric-Bernstam, Funda ;
Micheel, Christine M. ;
Miller, Clinton ;
Mills, Gordon .
CANCER DISCOVERY, 2017, 7 (08) :818-831
[4]  
[Anonymous], README MITAB2.6 for IRefIndex 19.0
[5]  
Ba JL, 2016, arXiv
[6]   Sequence analysis of mutations and translocations across breast cancer subtypes [J].
Banerji, Shantanu ;
Cibulskis, Kristian ;
Rangel-Escareno, Claudia ;
Brown, Kristin K. ;
Carter, Scott L. ;
Frederick, Abbie M. ;
Lawrence, Michael S. ;
Sivachenko, Andrey Y. ;
Sougnez, Carrie ;
Zou, Lihua ;
Cortes, Maria L. ;
Fernandez-Lopez, Juan C. ;
Peng, Shouyong ;
Ardlie, Kristin G. ;
Auclair, Daniel ;
Bautista-Pina, Veronica ;
Duke, Fujiko ;
Francis, Joshua ;
Jung, Joonil ;
Maffuz-Aziz, Antonio ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Pho, Nam H. ;
Quintanar-Jurado, Valeria ;
Ramos, Alex H. ;
Rebollar-Vega, Rosa ;
Rodriguez-Cuevas, Sergio ;
Romero-Cordoba, Sandra L. ;
Schumacher, Steven E. ;
Stransky, Nicolas ;
Thompson, Kristin M. ;
Uribe-Figueroa, Laura ;
Baselga, Jose ;
Beroukhim, Rameen ;
Polyak, Kornelia ;
Sgroi, Dennis C. ;
Richardson, Andrea L. ;
Jimenez-Sanchez, Gerardo ;
Lander, Eric S. ;
Gabriel, Stacey B. ;
Garraway, Levi A. ;
Golub, Todd R. ;
Melendez-Zajgla, Jorge ;
Toker, Alex ;
Getz, Gad ;
Hidalgo-Miranda, Alfredo ;
Meyerson, Matthew .
NATURE, 2012, 486 (7403) :405-409
[7]   Network medicine: a network-based approach to human disease [J].
Barabasi, Albert-Laszlo ;
Gulbahce, Natali ;
Loscalzo, Joseph .
NATURE REVIEWS GENETICS, 2011, 12 (01) :56-68
[8]   Classification of Large DNA Methylation Datasets for Identifying Cancer Drivers [J].
Celli, Fabrizio ;
Cumbo, Fabio ;
Weitschek, Emanuel .
BIG DATA RESEARCH, 2018, 13 :21-28
[9]   Wild-type p53 upregulates an early onset breast cancer-associated gene GAS7 to suppress metastasis via GAS7-CYFIP1-mediated signaling pathway [J].
Chang, Jer-Wei ;
Kuo, Wen-Hung ;
Lin, Chiao-Mei ;
Chen, Wen-Ling ;
Chan, Shih-Hsuan ;
Chiu, Meng-Fan ;
Chang, I-Shou ;
Jiang, Shih-Sheng ;
Tsai, Fang-Yu ;
Chen, Chung-Hsing ;
Huang, Pei-Hsin ;
Chang, King-Jen ;
Lin, Kai-Ti ;
Lin, Sheng-Chieh ;
Wang, Ming-Yang ;
Uen, Yih-Huei ;
Tu, Chi-Wen ;
Hou, Ming-Feng ;
Tsai, Shih-Feng ;
Shen, Chen-Yang ;
Tung, Shiao-Lin ;
Wang, Lu-Hai .
ONCOGENE, 2018, 37 (30) :4137-4150
[10]   Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool [J].
Chen, Edward Y. ;
Tan, Christopher M. ;
Kou, Yan ;
Duan, Qiaonan ;
Wang, Zichen ;
Meirelles, Gabriela Vaz ;
Clark, Neil R. ;
Ma'ayan, Avi .
BMC BIOINFORMATICS, 2013, 14