A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION

被引:0
|
作者
Zhang, Xiao [1 ]
Paz, Ivan [1 ]
Nebot, Angela [1 ]
机构
[1] Univ Politecn Cataluna, Soft Comp Res Grp, Intelligent Data Sci & Artificial Intelligence Re, Barcelona, Spain
来源
37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023 | 2023年
关键词
Rule-based approach; Oversampling; Data synthesis; Imbalanced data; Classification; DATA-SETS; SMOTE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
When confronted with imbalanced datasets, traditional classifiers frequently struggle to correctly categorize samples from the minority class, adversely impacting the overall predictive performance of machine learning models. Current oversampling techniques generally focus on data interpolation through neighbor selection, often neglecting to uncover underlying data structures and relationships. This study introduces a novel application for RuLer, an algorithm originally developed for identifying sound patterns in the artistic domain of live coding. When adapted for data oversampling (as Ad-RuLer), the algorithm shows significant promise in addressing the challenges associated with imbalanced class distribution. We undertake a thorough comparative evaluation of Ad-RuLer against established oversampling algorithms such as SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE. The evaluation employs various classifiers including logistic regression, random forest, and XGBoost, and is conducted over six real-world biomedical datasets with varying degrees of imbalance.
引用
收藏
页码:208 / 212
页数:5
相关论文
共 50 条
  • [11] A novel oversampling and feature selection hybrid algorithm for imbalanced data classification
    Feng, Fang
    Li, Kuan-Ching
    Yang, Erfu
    Zhou, Qingguo
    Han, Lihong
    Hussain, Amir
    Cai, Mingjiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 3231 - 3267
  • [12] Designing the rule classification with oversampling approach with high accuracy for imbalanced data in semiconductor production lines
    Wang, Hsiao-Yu
    Tsung, Chen-Kun
    Hung, Ching-Hua
    Chen, Chen-Huei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (25) : 36437 - 36452
  • [13] OVERSAMPLING METHOD FOR IMBALANCED CLASSIFICATION
    Zheng, Zhuoyuan
    Cai, Yunpeng
    Li, Ye
    COMPUTING AND INFORMATICS, 2015, 34 (05) : 1017 - 1037
  • [14] Integrated Oversampling for Imbalanced Time Series Classification
    Cao, Hong
    Li, Xiao-Li
    Woon, David Yew-Kwong
    Ng, See-Kiong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (12) : 2809 - 2822
  • [15] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [16] Selective oversampling approach for strongly imbalanced data
    Gnip P.
    Vokorokos L.
    Drotár P.
    PeerJ Computer Science, 2021, 7 : 1 - 22
  • [17] Selective oversampling approach for strongly imbalanced data
    Gnip, Peter
    Vokorokos, Liberios
    Drotar, Peter
    PEERJ COMPUTER SCIENCE, 2021,
  • [18] Evolutionary rule-based systems for imbalanced data sets
    Albert Orriols-Puig
    Ester Bernadó-Mansilla
    Soft Computing, 2009, 13
  • [19] Evolutionary rule-based systems for imbalanced data sets
    Orriols-Puig, Albert
    Bernado-Mansilla, Ester
    SOFT COMPUTING, 2009, 13 (03) : 213 - 225
  • [20] A novel oversampling method based on SeqGAN for imbalanced text classification
    Luo, Yin
    Weng, Xuanlong
    Zheng, Huang
    Feng, Haishan
    Luang, Ke
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2891 - 2894