Resampling approach for imbalanced data classification based on class instance density per feature value intervals

被引:0
|
作者
Wang, Fei [1 ]
Zheng, Ming [1 ,2 ]
Ma, Kai [1 ]
Hu, Xiaowen [1 ]
机构
[1] Anhui Normal Univ, Sch Comp & Informat, Wuhu 241002, Peoples R China
[2] Anhui Normal Univ, Anhui Prov Key Lab Ind Intelligence Data Secur, Wuhu 241002, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced datasets; Resampling; Classification; Class instance density; SMOTE;
D O I
10.1016/j.ins.2024.121570
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In practical applications, imbalanced datasets significantly degrade the classification performance of machine learning models. However, most conventional resampling approaches fall short in adequately addressing the varying contributions of individual features to the classification model. In response to this defect, this study introduces three novel resampling approaches. The first approach, Oversampling based on class instance density per feature value intervals (OCF), focuses on augmenting the dataset. The second approach, Undersampling based on class instance density per feature value intervals (UCF), seeks to reduce dataset size. The third approach, Hybrid sampling based on class instance density per feature value intervals (HSCF), which can perform oversampling and undersampling simultaneously. These approaches categorize feature value into different intervals based on their varying information content, calculate class instance densities within these intervals, and generate feature values in intervals with high discriminative information. Subsequently, these generated features are combined to synthesize minority class data, effectively achieving oversampling. Additionally, the study combines class instance density and feature importance to identify majority class data at the classification boundary with minimal contribution and subsequently executes undersampling. The flexibility to adjust sampling ratios and the integration of OCF and UCF enable the implementation of hybrid sampling. Finally, experiments on the benchmark dataset demonstrate the superiority and effectiveness of the proposed method. Furthermore, it is observed that the method proposed in this study enhances the feature dividing capability of decision tree classifiers. Hence, the best results are achieved when working in synergy with decision tree classifiers, leading to the most significant improvements in classification performance. All codes have been published at https://github.com/ Wangfeiopen/HSCF.
引用
收藏
页数:44
相关论文
共 50 条
  • [21] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Gaoshan Wang
    Jian Wang
    Kejing He
    Applied Intelligence, 2023, 53 : 4541 - 4562
  • [22] UFIDSF: An undersampling approach based on feature importance and double side filter for imbalanced data classification
    Zheng, Ming
    Wang, Fei
    Hu, Xiaowen
    Hu, Liangchen
    Yu, Qingying
    Zheng, Xiaoyao
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 167
  • [23] Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models
    Kraiem, Mohamed S.
    Sanchez-Hernandez, Fernando
    Moreno-Garcia, Maria N.
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [24] A new approach for imbalanced data classification based on data gravitation
    Peng, Lizhi
    Zhang, Hongli
    Yang, Bo
    Chen, Yuehui
    INFORMATION SCIENCES, 2014, 288 : 347 - 373
  • [25] A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION
    Zhang, Xiao
    Paz, Ivan
    Nebot, Angela
    37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023, 2023, : 208 - 212
  • [26] A gravitational density-based mass sharing method for imbalanced data classification
    Rahmati, Farshad
    Nezamabadi-pour, Hossein
    Nikpour, Bahareh
    SN APPLIED SCIENCES, 2020, 2 (02):
  • [27] LDAS: Local density-based adaptive sampling for imbalanced data classification
    Yan, Yuanting
    Jiang, Yifei
    Zheng, Zhong
    Yu, Chengjin
    Zhang, Yiwen
    Zhang, Yanping
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [28] A Density-Based Random Forest for Imbalanced Data Classification
    Dong, Jia
    Qian, Quan
    FUTURE INTERNET, 2022, 14 (03):
  • [29] Oversampling the minority class in a multi-linear feature space for imbalanced data classification
    Liang, Peifeng
    Li, Weite
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2018, 13 (10) : 1483 - 1491
  • [30] Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics
    Fu, Guang-Hui
    Wu, Yuan-Jiao
    Zong, Min-Jie
    Yi, Lun-Zhao
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 196