RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 43 条
  • [31] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [32] Research on the Classification of High Dimensional Imbalanced Data Based on the Optimizational Random Forest Algorithm
    Bo, Su
    PROCEEDINGS OF 2017 9TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA), 2017, : 228 - 231
  • [33] Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm
    Xiao Li
    Kewen Li
    Applied Intelligence, 2022, 52 : 6477 - 6502
  • [34] An Improved D2GAN-based oversampling algorithm for imbalanced data classification
    Zhao, Xiaoqiang
    Yao, Qinglei
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (06) : 569 - 582
  • [35] AUC4.5: AUC-Based C4.5 Decision Tree Algorithm for Imbalanced Data Classification
    Lee, Jong-Seok
    IEEE ACCESS, 2019, 7 : 106034 - 106042
  • [36] High-dimensional imbalanced biomedical data classification based on P-AdaBoost-PAUC algorithm
    Li, Xiao
    Li, Kewen
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (14) : 16581 - 16604
  • [37] High-dimensional imbalanced biomedical data classification based on P-AdaBoost-PAUC algorithm
    Xiao Li
    Kewen Li
    The Journal of Supercomputing, 2022, 78 : 16581 - 16604
  • [38] Data classification with radial basis function networks based on a novel kernel density estimation algorithm
    Oyang, YJ
    Hwang, SC
    Ou, YY
    Chen, CY
    Chen, ZW
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (01): : 225 - 236
  • [39] Application of Differential Evolution Algorithm Based on Mixed Penalty Function Screening Criterion in Imbalanced Data Integration Classification
    Gao, Yuelin
    Wang, Kaiguang
    Gao, Chenyang
    Shen, Yulong
    Li, Teng
    MATHEMATICS, 2019, 7 (12)
  • [40] Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data
    Li Yijing
    Guo Haixiang
    Liu Xiao
    Li Yanan
    Li Jinling
    KNOWLEDGE-BASED SYSTEMS, 2016, 94 : 88 - 104