Cancer subtype identification by multi-omics clustering based on interpretable feature and latent subspace learning

被引:2
作者
Shi, Tianyi [1 ]
Ye, Xiucai [1 ]
Huang, Dong [1 ]
Sakurai, Tetsuya [1 ]
机构
[1] Univ Tsukuba, Dept Comp Sci, Tsukuba 3058577, Japan
关键词
Multi-omics clustering; Cancer subtyping; Interpretable features; SHAP values; Latent subspace learning; COLLECTING DUCT CARCINOMA; VALIDATION; PRECISION; PACKAGE;
D O I
10.1016/j.ymeth.2024.09.014
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In recent years, multi-omics clustering has become a powerful tool in cancer research, offering a comprehensive perspective on the diverse molecular characteristics inherent to various cancer subtypes. However, most existing multi-omics clustering methods directly integrate heterogeneous features from different omics, which may struggle to deal with the noise or redundancy of multi-omics data and lead to poor clustering results. Therefore, we propose a novel multi-omics clustering method to extract interpretable and discriminative features from various omics before data integration. The clinical information is used to supervise the process of feature extraction based on SHAP (SHapley Additive exPlanation) values. Singular value decomposition (SVD) is then applied to integrate the extracted features of different omics by constructing a latent subspace. Finally, we utilize shared nearest neighbor-based spectral clustering on the latent representation to obtain the clustering result. The proposed method is evaluated on several cancer datasets across three levels of omics, in comparison to several state-of-the-art multi-omics clustering methods. The comparison results demonstrate the superior performance of the proposed method in multi-omics data analysis for cancer subtyping. Additionally, experiments reveal the efficacy of utilizing clinical information based on SHAP values for feature extraction, enhancing the performance of clustering analyses. Moreover, enrichment analysis of the identified gene signatures in different subtypes is also performed to further demonstrate the effectiveness of the proposed method. Availability: The proposed method can be freely accessible at https://github.com/Tianyi-Shi-Tsukuba/Multi -omics-clustering-based-on-SHAP. Data will be made available on request.
引用
收藏
页码:144 / 153
页数:10
相关论文
共 56 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]   Collecting Duct Carcinoma Versus Renal Medullary Carcinoma An Appeal for Nosologic and Biological Clarity [J].
Amin, Mahul B. ;
Smith, Steven C. ;
Agaimy, Abbas ;
Argani, Pedram ;
Comperat, Eva Marie ;
Delahunt, Brett ;
Epstein, Jonathan I. ;
Eble, John N. ;
Grignon, David J. ;
Hartmann, Arndt ;
Hes, Ondrej ;
Hirsch, Michelle S. ;
Jimenez, Rafael E. ;
Kunju, Lakshmi P. ;
Martignoni, Guido ;
McKenney, Jesse K. ;
Moch, Holger ;
Montironi, Rodolfo ;
Paner, Gladell P. ;
Rao, Priya ;
Srigley, John R. ;
Tickoo, Satish K. ;
Reuter, Victor E. .
AMERICAN JOURNAL OF SURGICAL PATHOLOGY, 2014, 38 (07) :871-874
[3]  
[Anonymous], 2008, Applied Survival Analysis: Regression Modeling of Time-to-Event Data
[4]   Collecting duct carcinoma: A rare malignancy [J].
Bose, Debdas ;
Das, Ram N. ;
Chatterjee, Uttara ;
Banerjee, Uma .
JOURNAL OF CANCER RESEARCH AND THERAPEUTICS, 2013, 9 (01) :94-95
[5]   Updating the Definition of Cancer [J].
Brown, Joel S. ;
Amend, Sarah R. ;
Austin, Robert H. ;
Gatenby, Robert A. ;
Hammarlund, Emma U. ;
Pienta, Kenneth J. .
MOLECULAR CANCER RESEARCH, 2023, 21 (11) :1142-1147
[6]   Markers of Kidney Tubular Secretion and Risk of Adverse Events in SPRINT Participants with CKD [J].
Bullen, Alexander L. ;
Ascher, Simon B. ;
Scherzer, Rebecca ;
Garimella, Pranav S. ;
Katz, Ronit ;
Hallan, Stein I. ;
Cheung, Alfred K. ;
Raphael, Kalani L. ;
Estrella, Michelle M. ;
Jotwani, Vasantha K. ;
Malhotra, Rakesh ;
Seegmiller, Jesse C. ;
Shlipak, Michael G. ;
Ix, Joachim H. .
JOURNAL OF THE AMERICAN SOCIETY OF NEPHROLOGY, 2022, 33 (10) :1915-1926
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   MOCSS: Multi-omics data clustering and cancer subtyping via shared and specific representation learning [J].
Chen, Yuxin ;
Wen, Yuqi ;
Xie, Chenyang ;
Chen, Xinjian ;
He, Song ;
Bo, Xiaochen ;
Zhang, Zhongnan .
ISCIENCE, 2023, 26 (08)
[9]   Comprehensive genomic characterization defines human glioblastoma genes and core pathways [J].
Chin, L. ;
Meyerson, M. ;
Aldape, K. ;
Bigner, D. ;
Mikkelsen, T. ;
VandenBerg, S. ;
Kahn, A. ;
Penny, R. ;
Ferguson, M. L. ;
Gerhard, D. S. ;
Getz, G. ;
Brennan, C. ;
Taylor, B. S. ;
Winckler, W. ;
Park, P. ;
Ladanyi, M. ;
Hoadley, K. A. ;
Verhaak, R. G. W. ;
Hayes, D. N. ;
Spellman, Paul T. ;
Absher, D. ;
Weir, B. A. ;
Ding, L. ;
Wheeler, D. ;
Lawrence, M. S. ;
Cibulskis, K. ;
Mardis, E. ;
Zhang, Jinghui ;
Wilson, R. K. ;
Donehower, L. ;
Wheeler, D. A. ;
Purdom, E. ;
Wallis, J. ;
Laird, P. W. ;
Herman, J. G. ;
Schuebel, K. E. ;
Weisenberger, D. J. ;
Baylin, S. B. ;
Schultz, N. ;
Yao, Jun ;
Wiedemeyer, R. ;
Weinstein, J. ;
Sander, C. ;
Gibbs, R. A. ;
Gray, J. ;
Kucherlapati, R. ;
Lander, E. S. ;
Myers, R. M. ;
Perou, C. M. ;
McLendon, Roger .
NATURE, 2008, 455 (7216) :1061-1068
[10]   Supervised Clustering for Subgroup Discovery: An Application to COVID-19 Symptomatology [J].
Cooper, Aidan ;
Doyle, Orla ;
Bourke, Alison .
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2021, 1525 :408-422