RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 50 条
  • [1] RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
    Koziarski, Michal
    Bellinger, Colin
    Wozniak, Michal
    MACHINE LEARNING, 2021, 110 (11-12) : 3059 - 3093
  • [2] RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification
    Koziarski, Michal
    Bellinger, Colin
    Wozniak, Michal
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [3] CCR: A COMBINED CLEANING AND RESAMPLING ALGORITHM FOR IMBALANCED DATA CLASSIFICATION
    Koziarski, Michal
    Wozniak, Michal
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2017, 27 (04) : 727 - 736
  • [4] Radial-Based Undersampling for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2020, 102
  • [5] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2019, 343 : 19 - 33
  • [6] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [7] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [8] Combined Cleaning and Resampling algorithm for multi-class imbalanced data with label noise
    Koziarski, Michal
    Wozniak, Michal
    Krawczyk, Bartosz
    KNOWLEDGE-BASED SYSTEMS, 2020, 204 (204)
  • [9] A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets
    Xu Han
    Runbang Cui
    Yanfei Lan
    Yanzhe Kang
    Jiang Deng
    Ning Jia
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 3687 - 3699
  • [10] A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets
    Han, Xu
    Cui, Runbang
    Lan, Yanfei
    Kang, Yanzhe
    Deng, Jiang
    Jia, Ning
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (12) : 3687 - 3699