VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis

被引:0
作者
Shi, Xinyuan [1 ]
Zhu, Fangfang [2 ]
Min, Wenwen [1 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming, Peoples R China
[2] Yunnan Open Univ, Sch Hlth & Nursing, Kunming, Peoples R China
基金
中国国家自然科学基金;
关键词
microbiome data; pretraining; variational autoencoder; Transformer; multihead-co-attention; saliency map;
D O I
10.1089/cmb.2024.0884
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
引用
收藏
页数:15
相关论文
共 29 条
[1]   Breath analysis based early gastric cancer classification from deep stacked sparse autoencoder neural network [J].
Aslam, Muhammad Aqeel ;
Xue, Cuili ;
Chen, Yunsheng ;
Zhang, Amin ;
Liu, Manhua ;
Wang, Kan ;
Cui, Daxiang .
SCIENTIFIC REPORTS, 2021, 11 (01)
[2]   Single-Cell (Multi)omics Technologies [J].
Chappell, Lia ;
Russell, Andrew J. C. ;
Voet, Thierry .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 19, 2018, 19 :15-41
[3]   APPLICATIONS OF NEXT-GENERATION SEQUENCING The human microbiome: at the interface of health and disease [J].
Cho, Ilseung ;
Blaser, Martin J. .
NATURE REVIEWS GENETICS, 2012, 13 (04) :260-270
[4]   DeepMicroGen: a generative adversarial network-based method for longitudinal microbiome data imputation [J].
Choi, Joung Min ;
Ji, Ming ;
Watson, Layne T. ;
Zhang, Liqing .
BIOINFORMATICS, 2023, 39 (05)
[5]   Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal [J].
Gao, Jianjiong ;
Aksoy, Buelent Arman ;
Dogrusoz, Ugur ;
Dresdner, Gideon ;
Gross, Benjamin ;
Sumer, S. Onur ;
Sun, Yichao ;
Jacobsen, Anders ;
Sinha, Rileen ;
Larsson, Erik ;
Cerami, Ethan ;
Sander, Chris ;
Schultz, Nikolaus .
SCIENCE SIGNALING, 2013, 6 (269) :pl1
[6]  
Nguyen TH, 2017, Arxiv, DOI arXiv:1712.00244
[7]  
Nguyen TH, 2018, Arxiv, DOI arXiv:1806.09046
[8]   Generating Novel Compounds Targeting SARS-CoV-2 Main Protease Based On Imbalanced Dataset [J].
Hu, Fan ;
Wang, Dongqi ;
Hu, Yishen ;
Jiang, Jiaxin ;
Yin, Peng .
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, :432-436
[9]   DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network [J].
Katzman, Jared L. ;
Shaham, Uri ;
Cloninger, Alexander ;
Bates, Jonathan ;
Jiang, Tingting ;
Kluger, Yuval .
BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
[10]  
Li Xiaoyu, 2023, 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), P1260, DOI 10.1109/BIBM58861.2023.10385792