RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification

被引:0
|
作者
Michał Koziarski
Colin Bellinger
Michał Woźniak
机构
[1] AGH University of Science and Technology,Department of Electronics
[2] National Research Council of Canada,Digital Technologies
[3] Wrocław University of Science and Technology,Department of Systems and Computer Networks
来源
Machine Learning | 2021年 / 110卷
关键词
Machine learning; Classification; Imbalanced data; Oversampling; Radial basis functions;
D O I
暂无
中图分类号
学科分类号
摘要
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5\times 2$$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
引用
收藏
页码:3059 / 3093
页数:34
相关论文
共 50 条
  • [31] Algorithm of Partition based Network Boosting for Imbalanced Data Classification
    Gou, Shuiping
    Yang, Hui
    Jiao, Licheng
    Zhuang, Xiong
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [32] Ensemble classification algorithm based improved SMOTE for imbalanced data
    Ning, Liu, 1600, Natsional'nyi Hirnychyi Universytet
  • [33] Enhancing associative classification on imbalanced data through ontology-based feature extraction and resampling
    Kouhoue, Joel Mba
    Lonlac, Jerry
    Lesage, Alexis
    Doniec, Arnaud
    Lecoeuche, Stephane
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [34] A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data
    Yu, Lean
    Zhou, Rongtian
    Tang, Ling
    Chen, Rongda
    APPLIED SOFT COMPUTING, 2018, 69 : 192 - 202
  • [35] Resampling approach for imbalanced data classification based on class instance density per feature value intervals
    Wang, Fei
    Zheng, Ming
    Ma, Kai
    Hu, Xiaowen
    INFORMATION SCIENCES, 2025, 692
  • [36] An Imbalanced Data Classification Method Based on Hybrid Resampling and Fine Cost Sensitive Support Vector Machine
    Zhu, Bo
    Jing, Xiaona
    Qiu, Lan
    Li, Runbo
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (03): : 3977 - 3999
  • [37] SVM Classification of Microaneurysms with Imbalanced Dataset Based on Borderline- SMOTE and Data Cleaning Techniques
    Wang, Qingjie
    Xin, Jingmin
    Wu, Jiayi
    Zheng, Nanning
    NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [38] A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification
    Pirgazi, Jamshid
    Pirmohammadi, Abbas
    Shams, Reza
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (06)
  • [39] A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling
    Yin, Qing-Yan
    Zhang, Jiang-She
    Zhang, Chun-Xia
    Ji, Nan-Nan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [40] Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models
    Kraiem, Mohamed S.
    Sanchez-Hernandez, Fernando
    Moreno-Garcia, Maria N.
    APPLIED SCIENCES-BASEL, 2021, 11 (18):