RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 50 条
  • [21] A GEV-Based Classification Algorithm for Imbalanced Data
    Fu J.
    Liu G.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (11): : 2361 - 2371
  • [22] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Gaoshan Wang
    Jian Wang
    Kejing He
    Applied Intelligence, 2023, 53 : 4541 - 4562
  • [23] A Combined Priori and Purity Gaussian OverSampling Algorithm for Imbalanced Data Classification
    Tao, Liangliang
    Zhu, Huping
    Wang, Qingya
    Liang, Yage
    Deng, Xiaozheng
    IEEE ACCESS, 2023, 11 : 130688 - 130696
  • [24] Majority-to-minority resampling for boosting-based classification under imbalanced data
    Wang, Gaoshan
    Wang, Jian
    He, Kejing
    APPLIED INTELLIGENCE, 2023, 53 (04) : 4541 - 4562
  • [25] Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning
    Ren, Siqi
    Zhu, Wen
    Liao, Bo
    Li, Zeng
    Wang, Peng
    Li, Keqin
    Chen, Min
    Li, Zejun
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 705 - 722
  • [26] A novel imbalanced data classification algorithm based on fuzzy rule
    Xu Z.-Y.
    Zhang Y.
    International Journal of Information and Communication Technology, 2019, 14 (03) : 373 - 384
  • [27] Imbalanced Data Classification Algorithm Based on Boosting and Cascade Model
    Zhang, Xiaolong
    Cheng, Chao
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2861 - 2866
  • [28] RBSP-Boosting: A Shapley value-based resampling approach for imbalanced data classification
    Chong, Weitu
    Chen, Ningjiang
    Fang, Chengyun
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1579 - 1595
  • [29] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Shujuan Wang
    Yuntao Dai
    Jihong Shen
    Jingxue Xuan
    Scientific Reports, 11
  • [30] Research on expansion and classification of imbalanced data based on SMOTE algorithm
    Wang, Shujuan
    Dai, Yuntao
    Shen, Jihong
    Xuan, Jingxue
    SCIENTIFIC REPORTS, 2021, 11 (01)