Radial-Based Oversampling for Multiclass Imbalanced Data Classification

被引:72
|
作者
Krawczyk, Bartosz [1 ]
Koziarski, Michal [2 ]
Wozniak, Michal [3 ]
机构
[1] Virginia Commonwealth Univ, Sch Engn, Dept Comp Sci, Richmond, VA 23284 USA
[2] AGH Univ Sci & Technol, Dept Elect, PL-30059 Krakow, Poland
[3] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, PL-50370 Wroclaw, Poland
关键词
Training; Learning systems; Taxonomy; Machine learning; Task analysis; Clustering algorithms; Proposals; Imbalanced data; machine learning; multiclass imbalance; oversampling; OVER-SAMPLING TECHNIQUE; MINORITY CLASS; DATA-SETS; SMOTE;
D O I
10.1109/TNNLS.2019.2913673
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from imbalanced data is among the most popular topics in the contemporary machine learning. However, the vast majority of attention in this field is given to binary problems, while their much more difficult multiclass counterparts are relatively unexplored. Handling data sets with multiple skewed classes poses various challenges and calls for a better understanding of the relationship among classes. In this paper, we propose multiclass radial-based oversampling (MC-RBO), a novel data-sampling algorithm dedicated to multiclass problems. The main novelty of our method lies in using potential functions for generating artificial instances. We take into account information coming from all of the classes, contrary to existing multiclass oversampling approaches that use only minority class characteristics. The process of artificial instance generation is guided by exploring areas where the value of the mutual class distribution is very small. This way, we ensure a smart oversampling procedure that can cope with difficult data distributions and alleviate the shortcomings of existing methods. The usefulness of the MC-RBO algorithm is evaluated on the basis of extensive experimental study and backed-up with a thorough statistical analysis. Obtained results show that by taking into account information coming from all of the classes and conducting a smart oversampling, we can significantly improve the process of learning from multiclass imbalanced data.
引用
收藏
页码:2818 / 2831
页数:14
相关论文
共 50 条
  • [1] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [2] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [3] Radial-Based Undersampling for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2020, 102
  • [4] RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
    Koziarski, Michal
    Bellinger, Colin
    Wozniak, Michal
    MACHINE LEARNING, 2021, 110 (11-12) : 3059 - 3093
  • [5] RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
    Michał Koziarski
    Colin Bellinger
    Michał Woźniak
    Machine Learning, 2021, 110 : 3059 - 3093
  • [6] RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
    Koziarski, Michal
    Bellinger, Colin
    Wozniak, Michal
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [7] Experimental Study on Modified Radial-Based Oversampling
    Bobowska, Barbara
    Wozniak, Michal
    INTERNATIONAL JOINT CONFERENCE SOCO'18-CISIS'18- ICEUTE'18, 2019, 771 : 110 - 119
  • [8] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [9] Radial-based oversampling based on differential evolution for imbalanced dataRadial-based oversampling based on differential...J. Chen et al.
    Jun Chen
    Meng Xia
    Zhijie Wang
    Applied Intelligence, 2025, 55 (7)
  • [10] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269