Radial-Based oversampling for noisy imbalanced data classification

被引:94
|
作者
Koziarski, Michal [1 ]
Krawczyk, Bartosz [2 ]
Wozniak, Michal [3 ]
机构
[1] AGH Univ Sci & Technol, Dept Elect, Al Mickiewicza 30, PL-30059 Krakow, Poland
[2] Virginia Commonwealth Univ, Dept Comp Sci, 401 West Main St,POB 843019, Richmond, VA 23284 USA
[3] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wybrzeze Wyspianskiego 27, PL-50370 Wroclaw, Poland
关键词
Pattern classification; Machine learning; Imbalanced data; Oversampling; Radial basis functions; Noisy data; SAMPLING METHOD; MINORITY CLASS; SMOTE; IDENTIFICATION; EXAMPLES;
D O I
10.1016/j.neucom.2018.04.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification remains a focus of intense research, mostly due to the prevalence of data imbalance in various real-life application domains. A disproportion among objects from different classes may significantly affect the performance of standard classification models. The first problem is the high imbalance ratios that pose a serious learning difficulty and require usage of dedicated methods, capable of alleviating this issue. The second important problem which may appear is noise, which may be accompanying the training data and causing strong deterioration of the classifier performance or increase the time required for its training. Therefore, the desirable classification model should be robust to both skewed data distributions and noise. One of the most popular approaches for handling imbalanced data is oversampling of the minority objects in their neighborhood. In this work we will criticize this approach and propose a novel strategy for dealing with imbalanced data, with particular focus on the noise presence. We propose Radial Based Oversampling (RBO) method, which can find regions in which the synthetic objects from minority class should be generated on the basis of the imbalance distribution estimation with radial basis functions. Results of experiments, carried out on a representative set of benchmark datasets, confirm that the proposed guided synthetic oversampling algorithm offers an interesting alternative to popular state-of-the-art solutions for imbalanced data preprocessing. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:19 / 33
页数:15
相关论文
共 50 条
  • [41] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Chen-ju Lin
    Florence Leony
    Knowledge and Information Systems, 2024, 66 : 2209 - 2233
  • [42] A novel oversampling method based on SeqGAN for imbalanced text classification
    Luo, Yin
    Weng, Xuanlong
    Zheng, Huang
    Feng, Haishan
    Luang, Ke
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2891 - 2894
  • [43] Perturbation-based oversampling technique for imbalanced classification problems
    Jianjun Zhang
    Ting Wang
    Wing W. Y. Ng
    Witold Pedrycz
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 773 - 787
  • [44] Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification
    Vairetti, Carla
    Assadi, Jose Luis
    Maldonado, Sebastian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [45] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Fang Feng
    Kuan-Ching Li
    Erfu Yang
    Qingguo Zhou
    Lihong Han
    Amir Hussain
    Mingjiang Cai
    Multimedia Tools and Applications, 2023, 82 : 3231 - 3267
  • [46] A Combined Priori and Purity Gaussian OverSampling Algorithm for Imbalanced Data Classification
    Tao, Liangliang
    Zhu, Huping
    Wang, Qingya
    Liang, Yage
    Deng, Xiaozheng
    IEEE ACCESS, 2023, 11 : 130688 - 130696
  • [47] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [48] Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification
    Ksieniewicz, Pawel
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 660 - 673
  • [49] Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms
    Sleeman, William C.
    Roseberry, Martha
    Ghosh, Preetam
    Cano, Alberto
    Krawczyk, Bartosz
    APPLIED INTELLIGENCE, 2024, 54 (23) : 12558 - 12575
  • [50] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652