A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION

被引:0
|
作者
Zhang, Xiao [1 ]
Paz, Ivan [1 ]
Nebot, Angela [1 ]
机构
[1] Univ Politecn Cataluna, Soft Comp Res Grp, Intelligent Data Sci & Artificial Intelligence Re, Barcelona, Spain
来源
37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023 | 2023年
关键词
Rule-based approach; Oversampling; Data synthesis; Imbalanced data; Classification; DATA-SETS; SMOTE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
When confronted with imbalanced datasets, traditional classifiers frequently struggle to correctly categorize samples from the minority class, adversely impacting the overall predictive performance of machine learning models. Current oversampling techniques generally focus on data interpolation through neighbor selection, often neglecting to uncover underlying data structures and relationships. This study introduces a novel application for RuLer, an algorithm originally developed for identifying sound patterns in the artistic domain of live coding. When adapted for data oversampling (as Ad-RuLer), the algorithm shows significant promise in addressing the challenges associated with imbalanced class distribution. We undertake a thorough comparative evaluation of Ad-RuLer against established oversampling algorithms such as SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE. The evaluation employs various classifiers including logistic regression, random forest, and XGBoost, and is conducted over six real-world biomedical datasets with varying degrees of imbalance.
引用
收藏
页码:208 / 212
页数:5
相关论文
共 50 条
  • [31] A novel imbalanced data classification algorithm based on fuzzy rule
    Xu Z.-Y.
    Zhang Y.
    International Journal of Information and Communication Technology, 2019, 14 (03) : 373 - 384
  • [32] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [33] An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data
    Lee, Dohyun
    Kim, Kyoungok
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
  • [34] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [35] A hierarchical heterogeneous ant colony optimization based oversampling algorithm using feature similarity for classification of imbalanced data
    Sreeja, N. K.
    Sreelaja, N. K.
    APPLIED SOFT COMPUTING, 2024, 166
  • [36] An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification
    Zhang, Yong
    Zuo, Tingting
    Fang, Lichao
    Li, Jun
    Xing, Zongyi
    IEEE ACCESS, 2021, 9 : 16030 - 16040
  • [37] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    PATTERN RECOGNITION, 2023, 133
  • [38] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Lin, Chen-ju
    Leony, Florence
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2209 - 2233
  • [39] Evolutionary Mahalanobis Distance-Based Oversampling for Multi-Class Imbalanced Data Classification
    Yao, Leehter
    Lin, Tung-Bin
    SENSORS, 2021, 21 (19)
  • [40] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753