Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms

被引:3
作者
Maurya, Neha Shree [1 ]
Kushwaha, Sandeep [2 ]
Vetukuri, Ramesh Raju [3 ]
Mani, Ashutosh [1 ]
机构
[1] Motilal Nehru Natl Inst Technol Allahabad, Dept Biotechnol, Prayagraj 211004, India
[2] Natl Inst Anim Biotechnol, Hyderabad 500032, India
[3] Swedish Univ Agr Sci, Dept Plant Breeding, S-23053 Alnarp, Sweden
关键词
colorectal cancer; feature selection; machine learning; gene expression; gene signatures; correlation; DRIVER MUTATIONS; EXPRESSION; PROGNOSIS;
D O I
10.3390/genes14101836
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank p-value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC.
引用
收藏
页数:15
相关论文
共 33 条
[1]   Rising incidence of early-onset colorectal cancer - a call to action [J].
Akimoto, Naohiko ;
Ugai, Tomotaka ;
Zhong, Rong ;
Hamada, Tsuyoshi ;
Fujiyoshi, Kenji ;
Giannakis, Marios ;
Wu, Kana ;
Cao, Yin ;
Ng, Kimmie ;
Ogino, Shuji .
NATURE REVIEWS CLINICAL ONCOLOGY, 2021, 18 (04) :230-243
[2]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[3]   The Effect of Nanoparticles on the Structure and Enzymatic Activity of Human Carbonic Anhydrase I and II [J].
Cabaleiro-Lago, Celia ;
Lundqvist, Martin .
MOLECULES, 2020, 25 (19)
[4]   Metastatic colorectal cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up [J].
Cervantes, A. ;
Adam, R. ;
Rosello, S. ;
Arnold, D. ;
Normanno, N. ;
Taieb, J. ;
Seligmann, J. ;
De Baere, T. ;
Osterlund, P. ;
Yoshino, T. ;
Martinelli, E. .
ANNALS OF ONCOLOGY, 2023, 34 (01) :10-32
[5]   Gene Expression Profiling of Colorectal Tumors and Normal Mucosa by Microarrays Meta-Analysis Using Prediction Analysis of Microarray, Artificial Neural Network, Classification, and Regression Trees [J].
Chu, Chi-Ming ;
Yao, Chung-Tay ;
Chang, Yu-Tien ;
Chou, Hsiu-Ling ;
Chou, Yu-Ching ;
Chen, Kang-Hua ;
Terng, Harn-Jing ;
Huang, Chi-Shuan ;
Lee, Chia-Cheng ;
Su, Sui-Lun ;
Liu, Yao-Chi ;
Lin, Fu-Gong ;
Wetter, Thomas ;
Chang, Chi-Wen .
DISEASE MARKERS, 2014, 2014
[6]   TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data [J].
Colaprico, Antonio ;
Silva, Tiago C. ;
Olsen, Catharina ;
Garofano, Luciano ;
Cava, Claudia ;
Garolini, Davide ;
Sabedot, Thais S. ;
Malta, Tathiane M. ;
Pagnotta, Stefano M. ;
Castiglioni, Isabella ;
Ceccarelli, Michele ;
Bontempi, Gianluca ;
Noushmehr, Houtan .
NUCLEIC ACIDS RESEARCH, 2016, 44 (08) :e71
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]  
Cristianini N., 2000, INTRO SUPPORT VECTOR
[9]   Development of Tumor Mutation Burden-Related Prognostic Model and Novel Biomarker Identification in Stomach Adenocarcinoma [J].
Fu, Min ;
Huang, Yongbiao ;
Peng, Xiaohong ;
Li, Xiaoyu ;
Luo, Na ;
Zhu, Wenjun ;
Yang, Feng ;
Chen, Ziqi ;
Ma, Shengling ;
Zhang, Yuanyuan ;
Li, Qianxia ;
Hu, Guangyuan .
FRONTIERS IN CELL AND DEVELOPMENTAL BIOLOGY, 2022, 10
[10]   Perspectives on carbonic anhydrase [J].
Gilmour, K. M. .
COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY A-MOLECULAR & INTEGRATIVE PHYSIOLOGY, 2010, 157 (03) :193-197