RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 43 条
  • [21] A novel imbalanced data classification algorithm based on fuzzy rule
    Xu Z.-Y.
    Zhang Y.
    International Journal of Information and Communication Technology, 2019, 14 (03) : 373 - 384
  • [22] A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data
    Yu, Lean
    Zhou, Rongtian
    Tang, Ling
    Chen, Rongda
    APPLIED SOFT COMPUTING, 2018, 69 : 192 - 202
  • [23] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [24] Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system
    Mao, Ting
    Zhou, Li
    Zhang, Yueyi
    Sun, Yefang
    APPLIED INTELLIGENCE, 2022, 52 (09) : 10674 - 10691
  • [25] A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data
    Sreeja, N. K.
    Sreelaja, N. K.
    APPLIED SOFT COMPUTING, 2024, 166
  • [26] Resampling approach for imbalanced data classification based on class instance density per feature value intervals
    Wang, Fei
    Zheng, Ming
    Ma, Kai
    Hu, Xiaowen
    INFORMATION SCIENCES, 2025, 692
  • [27] Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system
    Ting Mao
    Li Zhou
    Yueyi Zhang
    Yefang Sun
    Applied Intelligence, 2022, 52 : 10674 - 10691
  • [28] A weighted pattern matching approach for classification of imbalanced data with a fireworks-based algorithm for feature selection
    Sreeja, N. K.
    CONNECTION SCIENCE, 2019, 31 (02) : 143 - 168
  • [29] Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models
    Kraiem, Mohamed S.
    Sanchez-Hernandez, Fernando
    Moreno-Garcia, Maria N.
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [30] Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm
    Li, Xiao
    Li, Kewen
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6477 - 6502