MKC-SMOTE: A Novel Synthetic Oversampling Method for Multi-Class Imbalanced Data Classification

被引:0
|
作者
Wang, Jiao [1 ,2 ]
Awang, Norhashidah [1 ]
机构
[1] Univ Sains Malaysia, Sch Math Sci, George Town 11800, Malaysia
[2] Puer Univ, Sch Math & Stat, Puer 665000, Peoples R China
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Multi-class imbalanced dataset; classification; SMOTE algorithm; synthetic minority; oversampling; DATA-SETS;
D O I
10.1109/ACCESS.2024.3521120
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The learning of multi-class imbalance problems presents greater challenges and has fewer research results compared to binary imbalance problems. Resampling techniques are widely employed to address data imbalance problems. However, the majority of existing resampling methods are designed specifically for binary imbalance datasets and demonstrate significant limitations when applied to multi-class imbalance datasets. Therefore, this study introduces the MKC-SMOTE algorithm, a novel and effective method specifically tailored for multi-class imbalanced datasets. During the pre-processing phase, the algorithm takes into account the distribution of all classes and employs the k-nearest neighbors (kNN) algorithm to identify appropriate original samples for synthesizing minority class samples. It then utilizes an enhanced SMOTE algorithm for interpolation. In the post-processing phase, potentially misleading synthesized samples are eliminated by the undersampling technique. Consequently, the MKC-SMOTE algorithm generates high-quality minority class samples by strategically exploring the distributional regions of the classes. Extensive experiments were conducted on 21 real-world datasets, comparing the MKC-SMOTE algorithm with six imbalance problem handling methods and two classifiers. The results demonstrate that the MKC-SMOTE algorithm significantly enhances the classification performance of multi-class imbalanced datasets and outperforms several popular and state-of-the-art oversampling methods.
引用
收藏
页码:196929 / 196938
页数:10
相关论文
共 50 条
  • [41] An Effective Recursive Technique for Multi-Class Classification and Regression for Imbalanced Data
    Alam, Tahira
    Ahmed, Chowdhury Farhan
    Zahin, Sabit Anwar
    Khan, Muhammad Asif Hossain
    Islam, Maliha Tashfia
    IEEE ACCESS, 2019, 7 : 127615 - 127630
  • [42] Optimizing Multi-Class Text Classification Models for Imbalanced News Data
    Anitha, S.
    Kavi Varshini, E.
    Haritha Mahalakshmi, N.
    Jishnu, S.
    2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024,
  • [43] A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    Fan, Zhu
    INFORMATION SCIENCES, 2021, 565 : 438 - 455
  • [44] A GAN-Based Data Augmentation Method for Imbalanced Multi-Class Skin Lesion Classification
    Su, Qichen
    Hamed, Haza Nuzly Abdull
    Isa, Mohd Adham
    Hao, Xue
    Dai, Xin
    IEEE ACCESS, 2024, 12 : 16498 - 16513
  • [45] A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining
    Wongvorachan, Tarid
    He, Surina
    Bulut, Okan
    INFORMATION, 2023, 14 (01)
  • [46] Oversampling the minority class in a multi-linear feature space for imbalanced data classification
    Liang, Peifeng
    Li, Weite
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2018, 13 (10) : 1483 - 1491
  • [47] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322
  • [48] MULTI-CLASS DATA CLASSIFICATION FOR IMBALANCED DATA SET USING COMBINED SAMPLING APPROACHES
    Prachuabsupakij, Wanthanee
    Snonthornphisaj, Nuanwan
    KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 166 - 171
  • [49] Survey on Highly Imbalanced Multi-class Data
    Hamid, Hakim Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 211 - 229
  • [50] Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets
    Saez, Jose A.
    Krawczyk, Bartosz
    Wozniak, Michal
    PATTERN RECOGNITION, 2016, 57 : 164 - 178