Enhancing metastatic colorectal cancer prediction through advanced feature selection and machine learning techniques

被引:1
作者
Yang, Hui [1 ,2 ]
Liu, Jun [3 ]
Yang, Na [4 ,5 ]
Fu, Qingsheng [3 ]
Wang, Yingying [6 ]
Ye, Mingquan [7 ]
Tao, Shaoneng [6 ]
Liu, Xiaocen [6 ]
Li, Qingqing [7 ]
机构
[1] Yijishan Hosp, Affiliated Hosp 1, Wannan Med Coll, Cent Lab, Wuhu, Anhui, Peoples R China
[2] Anhui Prov Key Lab Noncoding RNA Basic & Clin Tran, Wuhu, Anhui, Peoples R China
[3] Yijishan Hosp, Affiliated Hosp 1, Wannan Med Coll, Dept Gastrointestinal Surg, Wuhu, Anhui, Peoples R China
[4] Yijishan Hosp, Affiliated Hosp 1, Wannan Med Coll, Dept Crit Care Med, Wuhu, Anhui, Peoples R China
[5] Clin Res Ctr Crit Resp Med Anhui Prov, Wuhu, Anhui, Peoples R China
[6] Yijishan Hosp, Affiliated Hosp 1, Wannan Med Coll, Dept Nucl Med, Wuhu 241001, Anhui, Peoples R China
[7] Wannan Med Coll, Res Ctr Hlth Big Data Min & Applicat, Sch Med Informat, Wuhu, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Colorectal cancer; Metastasis prediction; Feature selection; Machine learning; EXPRESSION;
D O I
10.1016/j.intimp.2024.113033
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Background and aims: Colorectal cancer (CRC) is the third most prevalent cancer globally, posing a significant challenge due to its high rate of metastasis. Approximately 20% of patients with CRC present with distant metastases at diagnosis, and over 50% develop metastases within five years. Accurate prediction of metastasis is crucial for improving survival outcomes in patients with CRC. Methods: This study introduces an innovative cost-sensitive fast correlation-based filter (CS-FCBF) algorithm for feature selection, integrated with machine learning techniques to predict metastatic CRC. The CS-FCBF algorithm effectively reduced the number of genomic features from 184 to 9 critical genes: CXCL9, C2CD4B, RGCC, GFI1, BEX2, CXCL3, FOXQ1, PBK, and PLAG1. The methodology combined in vitro, in vivo, and analysis of publicly available single-cell RNA-seq datasets to validate the findings. Results: The application of the CS-FCBF algorithm led to a significant improvement in prediction model performance, with an average 21.16% increase in the area under the precision-recall curve. The nine identified genes hold potential as diagnostic biomarkers and therapeutic targets for metastatic CRC. Conclusions: This study highlights the critical role of advanced feature selection methods, combined with machine learning, in addressing the challenge of class imbalance in medical diagnosis, particularly for CRC. Early detection of metastasis is vital, and the identified genes underscore their importance in the metastatic process of CRC. The methodology applied here offers valuable insights and paves the way for future research in other cancers or diseases that face similar diagnostic challenges.
引用
收藏
页数:10
相关论文
共 32 条
  • [1] Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality
    Al-Azani, Sadam
    Alkhnbashi, Omer S.
    Ramadan, Emad
    Alfarraj, Motaz
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (04)
  • [2] Detection of lymph node metastasis in colon cancer by ectopically expressed fibroblast markers FOXQ1 and THBS2
    Ali, Haytham
    Abdelmageed, Manar
    Olsson, Lina
    Lindmark, Gudrun
    Hammarstrom, Marie-Louise
    Hammarstrom, Sten
    Sitohy, Basel
    [J]. FRONTIERS IN ONCOLOGY, 2023, 13
  • [3] Biased Random Forest For Dealing With the Class Imbalance Problem
    Bader-El-Den, Mohammed
    Teitei, Eleman
    Perry, Todd
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (07) : 2163 - 2172
  • [4] Next-Generation Machine Learning for Biological Networks
    Camacho, Diogo M.
    Collins, Katherine M.
    Powers, Rani K.
    Costello, James C.
    Collins, James J.
    [J]. CELL, 2018, 173 (07) : 1581 - 1592
  • [5] Sulfotransferase SULT2B1 facilitates colon cancer metastasis by promoting SCD1-mediated lipid metabolism
    Che, Gang
    Wang, Wankun
    Wang, Jiawei
    He, Cheng
    Yin, Jie
    Chen, Zhendong
    He, Chao
    Wang, Xujing
    Yang, Yan
    Liu, Jian
    [J]. CLINICAL AND TRANSLATIONAL MEDICINE, 2024, 14 (02):
  • [6] Informative gene selection and the direct classification of tumors based on relative simplicity
    Chen, Yuan
    Wang, Lifeng
    Li, Lanzhi
    Zhang, Hongyan
    Yuan, Zheming
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [7] C2CD4B Evokes Oxidative Stress and Vascular Dysfunction via a PI3K/Akt/PKCα-Signaling Pathway
    Di Pietro, Paola
    Abate, Angela Carmelita
    Prete, Valeria
    Damato, Antonio
    Venturini, Eleonora
    Rusciano, Maria Rosaria
    Izzo, Carmine
    Visco, Valeria
    Ciccarelli, Michele
    Vecchione, Carmine
    Carrizzo, Albino
    [J]. ANTIOXIDANTS, 2024, 13 (01)
  • [8] Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification
    Feng, Fang
    Li, Kuan-Ching
    Shen, Jun
    Zhou, Qingguo
    Yang, Xuhui
    [J]. IEEE ACCESS, 2020, 8 : 69979 - 69996
  • [9] Tuning model parameters in class-imbalanced learning with precision-recall curve
    Fu, Guang-Hui
    Yi, Lun-Zhao
    Pan, Jianxin
    [J]. BIOMETRICAL JOURNAL, 2019, 61 (03) : 652 - 664
  • [10] C2CD4A/B variants in the predisposition of lung cancer in the Chinese Han population
    Han, Feifei
    Qian, Lu
    Zhang, Yi
    Liu, Ping
    Li, Rui
    Chen, Mingwei
    [J]. FUNCTIONAL & INTEGRATIVE GENOMICS, 2022, 22 (03) : 331 - 340