Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights

被引:19
作者
Yaqoob, Abrar [1 ]
Verma, Navneet Kumar [1 ]
Aziz, Rabia Musheer [2 ]
Shah, Mohd Asif [3 ,4 ,5 ]
机构
[1] VIT Bhopal Univ, Sch Adv Sci & Language, Bhopal 466114, India
[2] State Planning Inst, Planning Dept, New Div, Lucknow 226001, Utter Pradesh, India
[3] Kardan Univ, Dept Econ, Kabul 1001, Afghanistan
[4] Lovely Profess Univ, Div Res & Dev, Phagwara 144001, Punjab, India
[5] Chitkara Univ, Inst Engn & Technol, Ctr Res Impact & Outcome, Rajpura 140401, Punjab, India
关键词
Random drift optimization; XGBoost; Feature selection; Cancer classification; Microarray data analysis; ALGORITHM;
D O I
10.1007/s00262-024-03843-x
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.
引用
收藏
页数:14
相关论文
共 34 条
[1]   Quantum based Whale Optimization Algorithm for wrapper feature selection [J].
Agrawal, R. K. ;
Kaur, Baljeet ;
Sharma, Surbhi .
APPLIED SOFT COMPUTING, 2020, 89
[2]  
Ahmed A., 2019, Int. J. Intell. Eng. Syst, V12, P114, DOI [DOI 10.22266/IJIES2019.0430.12, 10.22266/IJIES2019.0430.12, 10.22266/ijies2019.0430.12]
[3]  
Benghazouani S., 2024, International Journal of Electrical and Computer Engineering (IJECE), V14, P944
[4]   Improved Support Vector Machine based on CNN-SVD for vision-threatening diabetic retinopathy detection and classification [J].
Bilal, Anas ;
Imran, Azhar ;
Baig, Talha Imtiaz ;
Liu, Xiaowen ;
Long, Haixia ;
Alzahrani, Abdulkareem ;
Shafiq, Muhammad .
PLOS ONE, 2024, 19 (01)
[5]   Accuracy assessment of RFerns, NB, SVM, and kNN machine learning classifiers in aquaculture [J].
Cakir, Mustafa ;
Yilmaz, Mesut ;
Oral, Muekerrem Atalay ;
Kazanci, Huseyin Ozgur ;
Oral, Okan .
JOURNAL OF KING SAUD UNIVERSITY SCIENCE, 2023, 35 (06)
[6]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[7]   A novel grey wolf optimization algorithm based on geometric transformations for gene selection and cancer classification [J].
Dabba, Ali ;
Tari, Abdelkamel ;
Meftali, Samy .
JOURNAL OF SUPERCOMPUTING, 2024, 80 (04) :4808-4840
[8]   Identification of significant bio-markers from high-dimensional cancerous data employing a modified multi-objective meta-heuristic algorithm [J].
Debata, Prajna Paramita ;
Mohapatra, Puspanjali .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) :4743-4755
[9]  
El-Mageed AAA., 2024, Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data, DOI [10.1186/s40537-024-00902-z, DOI 10.1186/S40537-024-00902-Z]
[10]   Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction [J].
Ghatasheh, Nazeeh ;
Altaharwa, Ismail ;
Aldebei, Khaled .
IEEE ACCESS, 2022, 10 :84365-84383