An integrated ensemble learning technique for gene expression classification and biomarker identification from RNA-seq data for pancreatic cancer prognosis

被引:4
作者
JagadeeswaraRao G. [1 ,2 ]
Sivaprasad A. [3 ]
机构
[1] AUTDRH, Andhra University, Visakhapatnam
[2] Department of IT, Aditya Institute of Technology and Management, Tekkali
[3] Department of Computer Science, Dr. V.S. Krishna Govt. Degree College, Visakhapatnam
关键词
Bio-ML; Biomarkers; Ensemble learning; Pancreatic cancer; RNA-seq; WGCNA;
D O I
10.1007/s41870-023-01688-8
中图分类号
学科分类号
摘要
Machine learning (ML) models are used in the interdisciplinary field of bio-ML to solve biological challenges. The diagnosis and treatment of cancer can benefit from the display of genetic mutations and complex biological process relationships in Ribonucleic acid sequencing (RNA-seq) data. In this paper, we are proposing a bio-ML approach to find gene biomarkers in pancreatic cancer (PC). The pancreatic adenocarcinoma (PAAD) gene expression data was obtained from The Cancer Genome Atlas (TCGA) project database. In our work, we used two methods: one is an ensemble stacking classifier with cross-validation (SCV), which is an ensemble of K-nearest neighbour (KNN), random forest (RF), gradient boosting (GB), and logistic regression (LR) classifiers for effective classification of differentially expressed genes (DEGs); and the second is weighted gene co-expression network analysis (WGCNA) to find the hub gene module. The genes reported from the first and second methods were intersected to find common DEGs. These DEGs were analysed using the PPI network, gene ontology, and pathways to identify the eight hub genes. These hub genes were further evaluated using Gene expression profiling interactive analysis version 2 (GEPIA2), resulting in four novel biomarkers (BUB1, BUB1B, KIF11, and TTK). We believe the integration of the ML approach in biological research is producing encouraging results and aiding in the resolution of challenging issues. © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2024.
引用
收藏
页码:1505 / 1516
页数:11
相关论文
共 54 条
[1]  
Lu W., Li N., Liao F., Identification of key genes and pathways in pancreatic cancer gene expression profile by integrative analysis, Genes, 10, 8, (2019)
[2]  
Zhao L., Zhao H., Yan H., Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes, BMC Cancer, 18, 1, pp. 1-13, (2018)
[3]  
Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Lander E.S., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science, 286, 5439, pp. 531-537, (1999)
[4]  
Berger M.F., Levin J.Z., Vijayendran K., Sivachenko A., Adiconis X., Maguire J., Garraway L.A., Integrative analysis of the melanoma transcriptome, Genome Res, 20, 4, pp. 413-427, (2010)
[5]  
Stupnikov A., McInerney C.E., Savage K.I., McIntosh S.A., Emmert-Streib F., Kennedy R., McArt D.G., Robustness of differential gene expression analysis of RNA-seq, Comput Struct Biotechnol J, 19, pp. 3470-3481, (2021)
[6]  
Stark, Et al., RNA sequencing: the teenage years, Nat Rev Genet, 20, pp. 631-656, (2019)
[7]  
Ozsolak F., Milos P.M., RNA sequencing: advances, challenges and opportunities, Nat Rev Genet, 12, pp. 87-98, (2011)
[8]  
Aguiar, Et al., Bayesian nonparametric discovery of isoforms and individual specific quantification, Nat Commun, 9, (2018)
[9]  
Bhat A.R., Hashmy R., Hierarchical autoencoder-based multi-omics subtyping and prognosis prediction framework for lung adenocarcinoma, Int J Inf Technol, 15, pp. 2541-2549, (2023)
[10]  
Kart O., Kokcu G., Cocan I.N., Et al., Application of network embedding and transcriptome data in supervised drug repositioning, Int J Inf Technol, 15, pp. 2637-2643, (2023)