An empirical study on the joint impact of feature selection and data resampling on imbalance classification

被引:27
|
作者
Zhang, Chongsheng [1 ]
Soda, Paolo [2 ,3 ]
Bi, Jingjun [1 ]
Fan, Gaojuan [1 ]
Almpanidis, George [1 ]
Garcia, Salvador [4 ]
Ding, Weiping [5 ]
机构
[1] Henan Univ, Henan Key Lab Big Data Anal & Proc, Kaifeng, Henan, Peoples R China
[2] Univ Campus Biomed Rome, Dept Engn, Rome, Italy
[3] Umea Univ, Dept Radiat Sci, Biomed Engn, Radiat Phys, Umea, Sweden
[4] Univ Granada, DaSCI Andalusian Res Inst, Granada, Spain
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
关键词
Imbalanced classification; Feature selection; Data selection; Resampling; SMOTE;
D O I
10.1007/s10489-022-03772-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
引用
收藏
页码:5449 / 5461
页数:13
相关论文
共 50 条
  • [1] An empirical study on the joint impact of feature selection and data resampling on imbalance classification
    Chongsheng Zhang
    Paolo Soda
    Jingjun Bi
    Gaojuan Fan
    George Almpanidis
    Salvador García
    Weiping Ding
    Applied Intelligence, 2023, 53 : 5449 - 5461
  • [2] Comprehensive empirical investigation for prioritizing the pipeline of using feature selection and data resampling techniques
    Tyagi, Pooja
    Singh, Jaspreeti
    Gosain, Anjana
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (03) : 6019 - 6040
  • [3] An Approach Based on Resampling and Feature Selection to Improve the Classification of Microarray Data
    Soleymani, Nafiseh
    Moattar, Mohammad Hussein
    2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 61 - 64
  • [4] Joint imbalanced classification and feature selection for hospital readmissions
    Du, Guodong
    Zhang, Jia
    Luo, Zhiming
    Ma, Fenglong
    Ma, Lei
    Li, Shaozi
    KNOWLEDGE-BASED SYSTEMS, 2020, 200
  • [5] Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data
    Welvaars, Koen
    Oosterhoff, Jacobien H. F.
    van den Bekerom, Michel P. J.
    Doornberg, Job N.
    van Haarst, Ernst P.
    JAMIA OPEN, 2023, 6 (02)
  • [6] Similarity of feature selection methods: An empirical study across data intensive classification tasks
    Dessi, Nicoletta
    Pes, Barbara
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (10) : 4632 - 4642
  • [7] Impact of feature selection methods on data classification for IDS
    Jiang, Shuai
    Xu, Xiaolong
    2019 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2019, : 174 - 180
  • [8] Empirical Study of Individual Feature Evaluators and Cutting Criteria for Feature Selection in Classification
    Arauzo-Azofra, Antonio
    Aznarte M, Jose L.
    Benitez, Jose M.
    2009 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2009, : 541 - +
  • [9] EasyEnsemble and Feature Selection for Imbalance Data Sets
    Liu, Tian-Yu
    2009 INTERNATIONAL JOINT CONFERENCE ON BIOINFORMATICS, SYSTEMS BIOLOGY AND INTELLIGENT COMPUTING, PROCEEDINGS, 2009, : 517 - 520
  • [10] Empirical study of feature selection methods based on individual feature evaluation for classification problems
    Arauzo-Azofra, Antonio
    Aznarte, Jose Luis
    Benitez, Jose M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) : 8170 - 8177