Resampling approach for imbalanced data classification based on class instance density per feature value intervals

被引:0
|
作者
Wang, Fei [1 ]
Zheng, Ming [1 ,2 ]
Ma, Kai [1 ]
Hu, Xiaowen [1 ]
机构
[1] Anhui Normal Univ, Sch Comp & Informat, Wuhu 241002, Peoples R China
[2] Anhui Normal Univ, Anhui Prov Key Lab Ind Intelligence Data Secur, Wuhu 241002, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced datasets; Resampling; Classification; Class instance density; SMOTE;
D O I
10.1016/j.ins.2024.121570
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In practical applications, imbalanced datasets significantly degrade the classification performance of machine learning models. However, most conventional resampling approaches fall short in adequately addressing the varying contributions of individual features to the classification model. In response to this defect, this study introduces three novel resampling approaches. The first approach, Oversampling based on class instance density per feature value intervals (OCF), focuses on augmenting the dataset. The second approach, Undersampling based on class instance density per feature value intervals (UCF), seeks to reduce dataset size. The third approach, Hybrid sampling based on class instance density per feature value intervals (HSCF), which can perform oversampling and undersampling simultaneously. These approaches categorize feature value into different intervals based on their varying information content, calculate class instance densities within these intervals, and generate feature values in intervals with high discriminative information. Subsequently, these generated features are combined to synthesize minority class data, effectively achieving oversampling. Additionally, the study combines class instance density and feature importance to identify majority class data at the classification boundary with minimal contribution and subsequently executes undersampling. The flexibility to adjust sampling ratios and the integration of OCF and UCF enable the implementation of hybrid sampling. Finally, experiments on the benchmark dataset demonstrate the superiority and effectiveness of the proposed method. Furthermore, it is observed that the method proposed in this study enhances the feature dividing capability of decision tree classifiers. Hence, the best results are achieved when working in synergy with decision tree classifiers, leading to the most significant improvements in classification performance. All codes have been published at https://github.com/ Wangfeiopen/HSCF.
引用
收藏
页数:44
相关论文
共 50 条
  • [1] A novel instance density-based hybrid resampling for imbalanced classification problems
    You-Jin Park
    Chung-Kang Ma
    Soft Computing, 2025, 29 (4) : 2031 - 2045
  • [2] RBSP-Boosting: A Shapley value-based resampling approach for imbalanced data classification
    Chong, Weitu
    Chen, Ningjiang
    Fang, Chengyun
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1579 - 1595
  • [3] Enhancing associative classification on imbalanced data through ontology-based feature extraction and resampling
    Kouhoue, Joel Mba
    Lonlac, Jerry
    Lesage, Alexis
    Doniec, Arnaud
    Lecoeuche, Stephane
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [4] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [5] An Approach Based on Resampling and Feature Selection to Improve the Classification of Microarray Data
    Soleymani, Nafiseh
    Moattar, Mohammad Hussein
    2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 61 - 64
  • [6] Imbalanced Data Classification Based on a Hybrid Resampling SVM Method
    Cao, Lu
    Zhai, Yikui
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 1533 - 1536
  • [7] Imbalanced educational data classification: an effective approach with resampling and random forest
    Vo Thi Ngoc Chau
    Nguyen Hua Phung
    PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 135 - 140
  • [8] FISA: Feature-based instance selection for imbalanced text classification
    Sun, Aixin
    Lim, Ee-Peng
    Benatallah, Boualem
    Hassan, Mahbub
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 250 - 254
  • [9] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [10] Instance importance based SVM for solving imbalanced data classification
    Yang, Yang
    Li, Shan-Ping
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (06): : 913 - 918