AVBAE-MODFR: A novel deep learning framework of embedding and feature selection on multi-omics data for pan-cancer classification

被引:1
作者
Li M. [1 ]
Guo H. [1 ]
Wang K. [1 ]
Kang C. [1 ]
Yin Y. [2 ]
Zhang H. [1 ]
机构
[1] National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, Engineering Research Center of Trusted Behavior Intelligence, Ministry of Education, College of Artificial Intelligence, Nankai University, Tongyan Road, Tianjin
[2] Department of Food Science and Technology, University of Nebraska - Lincoln, NE
基金
中国国家自然科学基金;
关键词
Deep learning; Feature importance ranking; Multi-omics; Pan-cancer classification; Variational autoencoders;
D O I
10.1016/j.compbiomed.2024.108614
中图分类号
学科分类号
摘要
Integration analysis of cancer multi-omics data for pan-cancer classification has the potential for clinical applications in various aspects such as tumor diagnosis, analyzing clinically significant features, and providing precision medicine. In these applications, the embedding and feature selection on high-dimensional multi-omics data is clinically necessary. Recently, deep learning algorithms become the most promising cancer multi-omic integration analysis methods, due to the powerful capability of capturing nonlinear relationships. Developing effective deep learning architectures for cancer multi-omics embedding and feature selection remains a challenge for researchers in view of high dimensionality and heterogeneity. In this paper, we propose a novel two-phase deep learning model named AVBAE-MODFR for pan-cancer classification. AVBAE-MODFR achieves embedding by a multi2multi autoencoder based on the adversarial variational Bayes method and further performs feature selection utilizing a dual-net-based feature ranking method. AVBAE-MODFR utilizes AVBAE to pre-train the network parameters, which improves the classification performance and enhances feature ranking stability in MODFR. Firstly, AVBAE learns high-quality representation among multiple omics features for unsupervised pan-cancer classification. We design an efficient discriminator architecture to distinguish the latent distributions for updating forward variational parameters. Secondly, we propose MODFR to simultaneously evaluate multi-omics feature importance for feature selection by training a designed multi2one selector network, where the efficient evaluation approach based on the average gradient of random mask subsets can avoid bias caused by input feature drift. We conduct experiments on the TCGA pan-cancer dataset and compare it with four state-of-the-art methods for each phase. The results show the superiority of AVBAE-MODFR over SOTA methods. © 2024 Elsevier Ltd
引用
收藏
相关论文
共 56 条
  • [1] Li Y., Wu F.-X., Ngom A., A review on machine learning principles for multi-view biological data integration, Brief. Bioinform., 19, 2, pp. 325-340, (2018)
  • [2] Rigden D.J., Fernandez-Suarez X.M., Galperin M.Y., The 2016 database issue of nucleic acids research and an updated molecular biology database collection, Nucl. Acids Res., 44, D1, pp. D1-D6, (2016)
  • [3] Sompairac N., Nazarov P.V., Czerwinska U., Cantini L., Biton A., Molkenov A., Zhumadilov Z., Barillot E., Radvanyi F., Gorban A., Kairov U., Zinovyev A., Independent component analysis for unraveling the complexity of cancer omics datasets, Int. J. Mol. Sci., 20, 18, (2019)
  • [4] Karczewski K.J., Snyder M.P., Integrative omics for health and disease, Nature Rev. Genet., 19, 5, pp. 299-310, (2018)
  • [5] Subramanian I., Verma S., Kumar S., Jere A., Anamika K., Multi-omics data integration, interpretation, and its application, Bioinform. Biol. Insights, 14, (2020)
  • [6] Chaudhary K., Poirion O.B., Lu L., Garmire L.X., Deep learning–based multi-omics integration robustly predicts survival in liver CancerUsing deep learning to predict liver cancer prognosis, Clin. Cancer Res., 24, 6, pp. 1248-1259, (2018)
  • [7] Tan K., Huang W., Hu J., Dong S., A multi-omics supervised autoencoder for pan-cancer clinical outcome endpoints prediction, BMC Med. Inform. Decis. Mak., 20, pp. 1-9, (2020)
  • [8] Zhang C., Chen Y., Zeng T., Zhang C., Chen L., Deep latent space fusion for adaptive representation of heterogeneous multi-omics data, Brief. Bioinform., 23, 2, (2022)
  • [9] Kang M., Ko E., Mersha T.B., A roadmap for multi-omics data integration using deep learning, Brief. Bioinform., 23, 1, (2022)
  • [10] Gligorijevic V., Barot M., Bonneau R., deepNF: deep network fusion for protein function prediction, Bioinformatics, 34, 22, pp. 3873-3881, (2018)