Radial-Based oversampling for noisy imbalanced data classification

被引:94
|
作者
Koziarski, Michal [1 ]
Krawczyk, Bartosz [2 ]
Wozniak, Michal [3 ]
机构
[1] AGH Univ Sci & Technol, Dept Elect, Al Mickiewicza 30, PL-30059 Krakow, Poland
[2] Virginia Commonwealth Univ, Dept Comp Sci, 401 West Main St,POB 843019, Richmond, VA 23284 USA
[3] Wroclaw Univ Sci & Technol, Dept Syst & Comp Networks, Wybrzeze Wyspianskiego 27, PL-50370 Wroclaw, Poland
关键词
Pattern classification; Machine learning; Imbalanced data; Oversampling; Radial basis functions; Noisy data; SAMPLING METHOD; MINORITY CLASS; SMOTE; IDENTIFICATION; EXAMPLES;
D O I
10.1016/j.neucom.2018.04.089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data classification remains a focus of intense research, mostly due to the prevalence of data imbalance in various real-life application domains. A disproportion among objects from different classes may significantly affect the performance of standard classification models. The first problem is the high imbalance ratios that pose a serious learning difficulty and require usage of dedicated methods, capable of alleviating this issue. The second important problem which may appear is noise, which may be accompanying the training data and causing strong deterioration of the classifier performance or increase the time required for its training. Therefore, the desirable classification model should be robust to both skewed data distributions and noise. One of the most popular approaches for handling imbalanced data is oversampling of the minority objects in their neighborhood. In this work we will criticize this approach and propose a novel strategy for dealing with imbalanced data, with particular focus on the noise presence. We propose Radial Based Oversampling (RBO) method, which can find regions in which the synthetic objects from minority class should be generated on the basis of the imbalance distribution estimation with radial basis functions. Results of experiments, carried out on a representative set of benchmark datasets, confirm that the proposed guided synthetic oversampling algorithm offers an interesting alternative to popular state-of-the-art solutions for imbalanced data preprocessing. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:19 / 33
页数:15
相关论文
共 50 条
  • [21] A quantum-based oversampling method for classification of highly imbalanced and overlapped data
    Yang, Bei
    Tian, Guilan
    Luttrell, Joseph
    Gong, Ping
    Zhang, Chaoyang
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (24) : 2500 - 2513
  • [22] OVERSAMPLING METHOD FOR IMBALANCED CLASSIFICATION
    Zheng, Zhuoyuan
    Cai, Yunpeng
    Li, Ye
    COMPUTING AND INFORMATICS, 2015, 34 (05) : 1017 - 1037
  • [23] Software quality classification with imbalanced and noisy data
    Folleco, Andres
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    THIRTEENTH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2007, : 191 - +
  • [24] Model-Based Oversampling for Imbalanced Sequence Classification
    Gong, Zhichen
    Chen, Huanhuan
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1009 - 1018
  • [25] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [26] An oversampling framework for imbalanced classification based on Laplacian eigenmaps
    Ye, Xiucai
    Li, Hongmin
    Imakura, Akira
    Sakurai, Tetsuya
    NEUROCOMPUTING, 2020, 399 : 107 - 116
  • [27] Imbalanced Learning with Oversampling based on Classification Contribution Degree
    Jiang, Zhenhao
    Yang, Jie
    Liu, Yan
    ADVANCED THEORY AND SIMULATIONS, 2021, 4 (05)
  • [28] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Liu, Jie
    SOFT COMPUTING, 2022, 26 (03) : 1141 - 1163
  • [29] Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data
    Jie Liu
    Soft Computing, 2022, 26 : 1141 - 1163
  • [30] Multi-oversampling with Evidence Fusion for Imbalanced Data Classification
    Tian, Hongpeng
    Zhang, Zuowei
    Liu, Zhunga
    Zuo, Jingwei
    BELIEF FUNCTIONS: THEORY AND APPLICATIONS, BELIEF 2024, 2024, 14909 : 68 - 77