A hybrid stacking classifier with feature selection for handling imbalanced data

被引:0
|
作者
Abraham A. [1 ]
Kayalvizhi R. [1 ]
Mohideen H.S. [2 ]
机构
[1] Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Kattankulathur, Chennai
[2] Department of Genetic Engineering, College of Engineering and Technology, SRM Institute of Science and Technology Kattankulathur, Chennai
来源
关键词
Machine learning; multi classification; Ovarian cancer; Pickle; Random Forest;
D O I
10.3233/JIFS-236197
中图分类号
学科分类号
摘要
Nowadays, cancer has become more alarming. This paper discusses the most significant Ovarian Cancer, Epithelial Ovarian Cancer (EOC), due to the low survival rate. The proposed algorithm for this work is a ‘Multi classifier ShapRFECV based EOC’ (MSRFECV-EOC) subtype analysis technique that utilized the EOC data from the National Centre for Biotechnology Information and Cancer Cell Line Encyclopedia websites for early identification of EOC using Machine Learning Techniques. This approach increases the data size, balances different classes of the data, and cuts down the enormous number of features unrelated to the disease of interest to prevent overfitting. To incorporate these functionalities, in the data preprocessing stage, OC-related gene names were taken from the Cancermine database and other OC-related works. Moreover, OC datasets were merged based on OC genes, and missing values of EOC subtypes were identified and imputed using Iterative Logistic Imputation. Synthetic Minority Oversampling Technique with an Edited Nearest Neighbors approach is applied to the imputed dataset. Next, in the Feature Selection phase, the most significant features for subtypes of EOC were identified by applying the Shapley Additive Explanations based on the Recursive Feature Elimination Cross-Validation (ShapRFECV) algorithm, preserving predefined features while selecting new EOC features. Eventually, an accuracy of 97% was achieved with Optuna-optimized Random Forest, which outperformed the existing models. SHAP plotted the most prominent features behind the classification. The Pickle tool saves much training time by preserving hidden parameter values of the model. In the final phase, by using the Stratified K Fold Stacking Classifier, the accuracy was improved to 98.9%. © 2024 – IOS Press. All rights reserved.
引用
收藏
页码:9103 / 9117
页数:14
相关论文
共 50 条
  • [31] When is resampling beneficial for feature selection with imbalanced wide data?
    Ramos-Perez, Ismael
    Arnaiz-Gonzalez, Alvar
    Rodriguez, Juan J.
    Garcia-Osorio, Cesar
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 188
  • [32] Feature selection for imbalanced data based on neighborhood rough sets
    Chen, Hongmei
    Li, Tianrui
    Fan, Xin
    Luo, Chuan
    INFORMATION SCIENCES, 2019, 483 : 1 - 20
  • [33] A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data
    Pouramini, Jafar
    Minaei-Bidgoli, Behrouze
    Esmaeili, Mahdi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (08): : 3725 - 3748
  • [34] Smart Robust Feature Selection (SoFt) for imbalanced and heterogeneous data
    Kasim, Henry
    King, Stephen
    Lee, Gary Kee Khoon
    Sirigina, Rajendra Prasad
    How, Shannon Shi Qi
    Hung, Terence Gih Guang
    KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [35] An effective distance based feature selection approach for imbalanced data
    Shaukat Ali Shahee
    Usha Ananthakumar
    Applied Intelligence, 2020, 50 : 717 - 745
  • [36] An effective distance based feature selection approach for imbalanced data
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    APPLIED INTELLIGENCE, 2020, 50 (03) : 717 - 745
  • [37] Evolutionary multistage multitasking method for feature selection in imbalanced data
    Ding, Weiping
    Yao, Hongcheng
    Huang, Jiashuang
    Hou, Tao
    Geng, Yu
    SWARM AND EVOLUTIONARY COMPUTATION, 2025, 92
  • [38] Feature selection for imbalanced data with deep sparse autoencoders ensemble
    Massi, Michela Carlotta
    Gasperoni, Francesca
    Ieva, Francesca
    Paganoni, Anna Maria
    STATISTICAL ANALYSIS AND DATA MINING, 2022, 15 (03) : 376 - 395
  • [39] Hybrid Cuckoo Search Algorithm for Simultaneous Feature and Classifier Selection
    Kulshestha, Geetika
    Mittal, Ayush
    Agarwal, Aman
    Sahoo, Anita
    2015 INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING AND INFORMATION PROCESSING (CCIP), 2015,
  • [40] Feature Selection Algorithm for Multiple Classifier Systems: A Hybrid Approach
    Delimata, Pawel
    Suraj, Zbigniew
    FUNDAMENTA INFORMATICAE, 2008, 85 (1-4) : 97 - 110