An empirical study on the joint impact of feature selection and data resampling on imbalance classification

被引:28
|
作者
Zhang, Chongsheng [1 ]
Soda, Paolo [2 ,3 ]
Bi, Jingjun [1 ]
Fan, Gaojuan [1 ]
Almpanidis, George [1 ]
Garcia, Salvador [4 ]
Ding, Weiping [5 ]
机构
[1] Henan Univ, Henan Key Lab Big Data Anal & Proc, Kaifeng, Henan, Peoples R China
[2] Univ Campus Biomed Rome, Dept Engn, Rome, Italy
[3] Umea Univ, Dept Radiat Sci, Biomed Engn, Radiat Phys, Umea, Sweden
[4] Univ Granada, DaSCI Andalusian Res Inst, Granada, Spain
[5] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
关键词
Imbalanced classification; Feature selection; Data selection; Resampling; SMOTE;
D O I
10.1007/s10489-022-03772-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
引用
收藏
页码:5449 / 5461
页数:13
相关论文
共 50 条
  • [41] Impact of Feature Selection in Classification for Hidden Channel Detection on the Example of Audio Data Hiding
    Kraetzer, Christian
    Dittmann, Jana
    MM&SEC'08: PROCEEDINGS OF THE MULTIMEDIA & SECURITY WORKSHOP 2008, 2008, : 159 - 166
  • [42] Feature Selection and Ensemble Meta Classifier for Multiclass Imbalance Data Learning
    Sainin, Mohd Shamrie
    Alfred, Rayner
    Alias, Suraya
    Lammasha, Mohamed A. M.
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2018, 2018, : 134 - 139
  • [43] Ship Classification in SAR Image by Joint Feature and Classifier Selection
    Lang, Haitao
    Zhang, Jie
    Zhang, Xi
    Meng, Junmin
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (02) : 212 - 216
  • [44] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Feng, Fang
    Li, Kuan-Ching
    Yang, Erfu
    Zhou, Qingguo
    Han, Lihong
    Hussain, Amir
    Cai, Mingjiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3231 - 3267
  • [45] Identifying Feature Pattern for Weighted Imbalance Data: A Feature Selection Study for Thoracolumbar Spine Fractures in Crash Injury Research
    Nitu, Paromita S.
    Madiraju, Praveen
    Pintar, Frank A.
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 142 - 147
  • [46] Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data
    Patil, Abhijeet R.
    Kim, Sangjin
    MATHEMATICS, 2020, 8 (01)
  • [47] A comparative study of feature selection and classification methods for gene expression data of glioma
    Abusamra, Heba
    4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO2013), 2013, 23 : 5 - 14
  • [48] Distributed feature selection: An application to microarray data classification
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    APPLIED SOFT COMPUTING, 2015, 30 : 136 - 150
  • [49] Feature Selection for the Classification of Longitudinal Human Ageing Data
    Pomsuwan, Tossapol
    Freitas, Alex A.
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 739 - 746
  • [50] A New Feature Selection Algorithm for Stream Data Classification
    Wankhade, Kapil
    Rane, Dhiraj
    Thool, Ravindra
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1843 - 1848