Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis

被引:86
作者
Su, Ying [1 ]
Tian, Xuecong [1 ]
Gao, Rui [3 ]
Guo, Wenjia [2 ]
Chen, Cheng [1 ]
Chen, Chen [3 ,4 ]
Jia, Dongfang [1 ]
Li, Hongtao [2 ]
Lv, Xiaoyi [1 ,5 ]
机构
[1] Xinjiang Univ, Coll Software, Urumqi 830046, Xinjiang, Peoples R China
[2] Xinjiang Med Univ, Affiliated Tumor Hosp, Urumqi 830011, Peoples R China
[3] Xinjiang Med Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[4] Cloud Comp Engn Technol Res Ctr Xinjiang, Kelamayi 834099, Peoples R China
[5] Xinjiang Univ, Key Lab Signal Detect & Proc, Urumqi 830046, Xinjiang, Peoples R China
关键词
Machine learning; Colon cancer; Prognosis; WGCNA; Staging; PPI; GENE-EXPRESSION;
D O I
10.1016/j.compbiomed.2022.105409
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Advanced metastasis of colon cancer makes it more difficult to treat colon cancer. Finding the markers of colon cancer (Colon Cancer) can diagnose the stage of cancer in time and improve the prognosis with timely treatment. This paper uses gene expression profiling data from The Cancer Genome Atlas (TCGA) for the diagnosis of colon cancer and its staging. In this study, we first selected the gene modules with the greatest correlation with cancer by Weighted Gene Co-expression Network Analysis (WGCNA), extracted the characteristic genes for differential expression results using the least absolute shrinkage and selection operator algorithm (Lasso) and performed survival analysis, and then combined the genes in the modules with the Lasso-extracted feature genes were combined to diagnose colon cancer versus healthy controls using RF, SVM and decision trees, and colon cancer staging was diagnosed using differentially expressed genes for each stage. Finally, Protein-Protein Interaction Networks (PPI) networks were done for 289 genes to identify clusters of aggregated proteins for survival analysis. Finally, the RF model had the best results in the diagnosis of colon cancer versus control group fold cross validation with an average accuracy of 99.81%, F1 value reaching 0.9968, accuracy of 99.88%, and recall of 99.5%, and an average accuracy of 91.5%, F1 value reaching 0.7679, accuracy of 86.94%, and recall in the diagnosis of colon cancer stages I, II, III and IV. The recall rate reached 73.04%, and eight genes associated with colon cancer prognosis were identified for GCNT2, GLDN, SULT1B1, UGT2B15, PTGDR2, GPR15, BMP5 and CPT2.
引用
收藏
页数:10
相关论文
共 36 条
[1]  
[Anonymous], CANC SURVIVAL RATES
[2]  
[Anonymous], STATISTICS-ABINGDON
[3]  
[Anonymous], STAG CANC
[4]  
[Anonymous], STAG
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Human serum mid-infrared spectroscopy combined with machine learning algorithms for rapid detection of gliomas [J].
Chen, Fangfang ;
Meng, Chunzhi ;
Qu, Hanwen ;
Cheng, Chen ;
Chen, Chen ;
Yang, Bo ;
Gao, Rui ;
Lv, Xiaoyi .
PHOTODIAGNOSIS AND PHOTODYNAMIC THERAPY, 2021, 35
[7]  
Cristianini N., 2000, INTRO SUPPORT VECTOR
[8]   Bladder cancer stage-associated hub genes revealed by WGCNA co-expression network analysis [J].
Di, Yu ;
Chen, Dongshan ;
Yu, Wei ;
Yan, Lei .
HEREDITAS, 2019, 156 (1)
[9]  
Fratello M, 2018, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, V1, DOI [DOI 10.1016/B978-0-12-809633-8.20337-3, 10.1016/B978-0-12-809633-8.20337-3]
[10]   Scope of Artificial Intelligence in Screening and Diagnosis of Colorectal Cancer [J].
Goyal, Hemant ;
Mann, Rupinder ;
Gandhi, Zainab ;
Perisetti, Abhilash ;
Ali, Aman ;
Ali, Khizar Aman ;
Sharma, Neil ;
Saligram, Shreyas ;
Tharian, Benjamin ;
Inamdar, Sumant .
JOURNAL OF CLINICAL MEDICINE, 2020, 9 (10) :1-22