A hybrid stacking classifier with feature selection for handling imbalanced data

被引:0
|
作者
Abraham A. [1 ]
Kayalvizhi R. [1 ]
Mohideen H.S. [2 ]
机构
[1] Department of Networking and Communications, School of Computing, SRM Institute of Science and Technology Kattankulathur, Chennai
[2] Department of Genetic Engineering, College of Engineering and Technology, SRM Institute of Science and Technology Kattankulathur, Chennai
来源
关键词
Machine learning; multi classification; Ovarian cancer; Pickle; Random Forest;
D O I
10.3233/JIFS-236197
中图分类号
学科分类号
摘要
Nowadays, cancer has become more alarming. This paper discusses the most significant Ovarian Cancer, Epithelial Ovarian Cancer (EOC), due to the low survival rate. The proposed algorithm for this work is a ‘Multi classifier ShapRFECV based EOC’ (MSRFECV-EOC) subtype analysis technique that utilized the EOC data from the National Centre for Biotechnology Information and Cancer Cell Line Encyclopedia websites for early identification of EOC using Machine Learning Techniques. This approach increases the data size, balances different classes of the data, and cuts down the enormous number of features unrelated to the disease of interest to prevent overfitting. To incorporate these functionalities, in the data preprocessing stage, OC-related gene names were taken from the Cancermine database and other OC-related works. Moreover, OC datasets were merged based on OC genes, and missing values of EOC subtypes were identified and imputed using Iterative Logistic Imputation. Synthetic Minority Oversampling Technique with an Edited Nearest Neighbors approach is applied to the imputed dataset. Next, in the Feature Selection phase, the most significant features for subtypes of EOC were identified by applying the Shapley Additive Explanations based on the Recursive Feature Elimination Cross-Validation (ShapRFECV) algorithm, preserving predefined features while selecting new EOC features. Eventually, an accuracy of 97% was achieved with Optuna-optimized Random Forest, which outperformed the existing models. SHAP plotted the most prominent features behind the classification. The Pickle tool saves much training time by preserving hidden parameter values of the model. In the final phase, by using the Stratified K Fold Stacking Classifier, the accuracy was improved to 98.9%. © 2024 – IOS Press. All rights reserved.
引用
收藏
页码:9103 / 9117
页数:14
相关论文
共 50 条
  • [41] Sentiment classification using hybrid feature selection and ensemble classifier
    Jain, Achin
    Jain, Vanita
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (02) : 659 - 668
  • [42] Feature selection using a hybrid associative classifier with masking techniques
    Aldape-Perez, M.
    Yanez-Marquez, C.
    Lopez Leyva, L. O.
    MICAI 2006: FIFTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, : 151 - +
  • [43] A Hybrid Feature Selection with Ensemble Classification for Imbalanced Healthcare Data: A Case Study for Brain Tumor Diagnosis
    Huda S.
    Yearwood J.
    Jelinek H.F.
    Hassan M.M.
    Fortino G.
    Buckland M.
    IEEE Access, 2016, 4 : 9145 - 9154
  • [44] Weighted Gini Index Feature Selection Method for Imbalanced Data
    Liu, Haoyue
    Zhou, MengChu
    Lu, Xiaoyu Sean
    Yao, Cynthia
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2018,
  • [45] Feature selection via minimizing global redundancy for imbalanced data
    Shuhao Huang
    Hongmei Chen
    Tianrui Li
    Hao Chen
    Chuan Luo
    Applied Intelligence, 2022, 52 : 8685 - 8707
  • [46] A Tabular Variational Auto Encoder-Based Hybrid Model for Imbalanced Data Classification With Feature Selection
    Abraham, Asha
    Mohideen, Habeeb Shaik
    Kayalvizhi, R.
    IEEE ACCESS, 2023, 11 : 122760 - 122771
  • [47] A Hybrid Feature Selection With Ensemble Classification for Imbalanced Healthcare Data A Case Study for Brain Tumor Diagnosis
    Huda, Shamsul
    Yearwood, John
    Jelinek, Herbert F.
    Hassan, Mohammad Mehedi
    Fortino, Giancorlo
    Buckland, Michael
    IEEE ACCESS, 2016, 4 : 9145 - 9154
  • [48] Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem
    Qasim, Hayder Mohammed
    Ata, Oguz
    Ansari, Mohammad Azam
    Alomary, Mohammad N.
    Alghamdi, Saad
    Almehmadi, Mazen
    MEDICINA-LITHUANIA, 2021, 57 (11):
  • [49] Handling Imbalanced Data: A Survey
    Rout, Neelam
    Mishra, Debahuti
    Mallick, Manas Kumar
    INTERNATIONAL PROCEEDINGS ON ADVANCES IN SOFT COMPUTING, INTELLIGENT SYSTEMS AND APPLICATIONS, ASISA 2016, 2018, 628 : 431 - 443
  • [50] Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams
    Zyblewski, Pawel
    Sabourin, Robert
    Wozniak, Michal
    INFORMATION FUSION, 2021, 66 : 138 - 154