RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 50 条
  • [41] A Combined Algorithm for Imbalanced Classification Based on Dual Distribution Representation Learning and Classifier Decoupling Learning
    Lin, Guoyuan
    Liao, Hongyu
    Gao, Hongxia
    Ma, Jianliang
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SOFTWARE ENGINEERING (ICICSE 2022), 2022, : 18 - 24
  • [42] The classification method based on evolutionary algorithm for high-dimensional imbalanced missing data
    Liu, Yi
    Li, Gengsong
    Li, Xiang
    Qin, Wei
    Zheng, Qibin
    Ren, Xiaoguang
    ELECTRONICS LETTERS, 2023, 59 (12)
  • [43] Classification algorithm for class imbalanced data based on optimized Mahalanobis-Taguchi system
    Mao, Ting
    Zhou, Li
    Zhang, Yueyi
    Sun, Yefang
    APPLIED INTELLIGENCE, 2022, 52 (09) : 10674 - 10691
  • [44] Research on the Classification of High Dimensional Imbalanced Data based on the Optimization of Random Forest Algorithm
    Ma Xiaojuan
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND TECHNOLOGY (BDET 2018), 2018, : 60 - 67
  • [45] Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm
    Li, Xiao
    Li, Kewen
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6477 - 6502
  • [46] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [47] An intrusion detection imbalanced data classification algorithm based on CWGAN-GP oversampling
    Yao, Qinglei
    Zhao, Xiaoqiang
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2025, 18 (03)
  • [48] Research on the Classification of High Dimensional Imbalanced Data Based on the Optimizational Random Forest Algorithm
    Bo, Su
    PROCEEDINGS OF 2017 9TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA), 2017, : 228 - 231
  • [49] An Improved D2GAN-based oversampling algorithm for imbalanced data classification
    Zhao, Xiaoqiang
    Yao, Qinglei
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 569 - 582
  • [50] Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm
    Xiao Li
    Kewen Li
    Applied Intelligence, 2022, 52 : 6477 - 6502