A NOVEL RULE-BASED OVERSAMPLING APPROACH FOR IMBALANCED DATA CLASSIFICATION

被引:0
作者
Zhang, Xiao [1 ]
Paz, Ivan [1 ]
Nebot, Angela [1 ]
机构
[1] Univ Politecn Cataluna, Soft Comp Res Grp, Intelligent Data Sci & Artificial Intelligence Re, Barcelona, Spain
来源
37TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE 2023, ESM 2023 | 2023年
关键词
Rule-based approach; Oversampling; Data synthesis; Imbalanced data; Classification; DATA-SETS; SMOTE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
When confronted with imbalanced datasets, traditional classifiers frequently struggle to correctly categorize samples from the minority class, adversely impacting the overall predictive performance of machine learning models. Current oversampling techniques generally focus on data interpolation through neighbor selection, often neglecting to uncover underlying data structures and relationships. This study introduces a novel application for RuLer, an algorithm originally developed for identifying sound patterns in the artistic domain of live coding. When adapted for data oversampling (as Ad-RuLer), the algorithm shows significant promise in addressing the challenges associated with imbalanced class distribution. We undertake a thorough comparative evaluation of Ad-RuLer against established oversampling algorithms such as SMOTE, ADASYN, Tomek-links, Borderline-SMOTE, and KmeansSMOTE. The evaluation employs various classifiers including logistic regression, random forest, and XGBoost, and is conducted over six real-world biomedical datasets with varying degrees of imbalance.
引用
收藏
页码:208 / 212
页数:5
相关论文
共 12 条
[1]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[2]   Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE [J].
Douzas, Georgios ;
Bacao, Fernando ;
Last, Felix .
INFORMATION SCIENCES, 2018, 465 :1-20
[3]   EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Herrera, Francisco .
PATTERN RECOGNITION, 2013, 46 (12) :3460-3471
[4]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[5]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[6]  
Jo T., 2004, ACM Sigkdd Explorations Newsletter, V6, P40, DOI [10.1145/1007730.1007737, DOI 10.1145/1007730.1007737]
[7]   Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data [J].
Khan, Salman H. ;
Hayat, Munawar ;
Bennamoun, Mohammed ;
Sohel, Ferdous A. ;
Togneri, Roberto .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (08) :3573-3587
[8]  
Kubat M., 1997, P 14 INT C MACH LEAR, V97, P179
[9]   Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning [J].
Lim, Pin ;
Goh, Chi Keong ;
Tan, Kay Chen .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (09) :2850-2861
[10]   Self-paced Ensemble for Highly Imbalanced Massive Data Classification [J].
Liu, Zhining ;
Cao, Wei ;
Gao, Zhifeng ;
Bian, Jiang ;
Chen, Hechang ;
Chang, Yi ;
Liu, Tie-Yan .
2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, :841-852