An empirical study on the joint impact of feature selection and data resampling on imbalance classification

被引:28
|
作者
Zhang, Chongsheng [1 ]
Soda, Paolo [2 ,3 ]
Bi, Jingjun [1 ]
Fan, Gaojuan [1 ]
Almpanidis, George [1 ]
Garcia, Salvador [4 ]
Ding, Weiping [5 ]
机构
[1] Henan Univ, Henan Key Lab Big Data Anal & Proc, Kaifeng, Henan, Peoples R China
[2] Univ Campus Biomed Rome, Dept Engn, Rome, Italy
[3] Umea Univ, Dept Radiat Sci, Biomed Engn, Radiat Phys, Umea, Sweden
[4] Univ Granada, DaSCI Andalusian Res Inst, Granada, Spain
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
关键词
Imbalanced classification; Feature selection; Data selection; Resampling; SMOTE;
D O I
10.1007/s10489-022-03772-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
引用
收藏
页码:5449 / 5461
页数:13
相关论文
共 50 条
  • [21] Impact of Membership and Non-membership Features on Classification Decision: An Empirical Study for Appraisal of Feature Selection Methods
    Abbasi, Bushra Zaheer
    Hussain, Shahid
    Bibi, Shaista
    Shah, Munam Ali
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 454 - 459
  • [22] Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification
    Frenay, Benoit
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2013, 112 : 64 - 78
  • [23] An Empirical Evaluation of Feature Selection Stability and Classification Accuracy
    Buyukkececi, Mustafa
    Okur, Mehmet Cudi
    GAZI UNIVERSITY JOURNAL OF SCIENCE, 2024, 37 (02): : 606 - 620
  • [24] Effects of classification, feature selection, and resampling methods on bankruptcy prediction of small and medium-sized enterprises
    Papikova, Lenka
    Papik, Mario
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2022, 29 (04) : 254 - 281
  • [25] Joint feature and instance selection using manifold data criteria: application to image classification
    Dornaika, Fadi
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (03) : 1735 - 1765
  • [26] Joint feature and instance selection using manifold data criteria: application to image classification
    Fadi Dornaika
    Artificial Intelligence Review, 2021, 54 : 1735 - 1765
  • [27] Assessing feature selection method performance with class imbalance data
    Matharaarachchi, Surani
    Domaratzki, Mike
    Muthukumarana, Saman
    MACHINE LEARNING WITH APPLICATIONS, 2021, 6
  • [28] The impact of feature selection and feature reduction techniques for code smell detection: A comprehensive empirical study
    Zexian Zhang
    Lin Zhu
    Shuang Yin
    Wenhua Hu
    Shan Gao
    Haoxuan Chen
    Fuyang Li
    Automated Software Engineering, 2025, 32 (2)
  • [29] An empirical study to investigate the impact of data resampling techniques on the performance of class maintainability prediction models
    Malhotra, Ruchika
    Lata, Kusum
    NEUROCOMPUTING, 2021, 459 : 432 - 453
  • [30] Optimizing Neural Networks for Academic Performance Classification Using Feature Selection and Resampling Approach
    Supriyadi D.
    Purwanto P.
    Warsito B.
    Mendel, 2023, 29 (02) : 261 - 272