RRBoost: a new ensemble method for classifying imbalanced data

被引:0
作者
Park, Hyejoon [1 ,2 ]
Kim, Hyunjoong [3 ]
机构
[1] Catholic Univ Korea, Coll Med, Dept Pharmacol, 222 Banpo Daero, Seoul 06591, South Korea
[2] Catholic Univ Korea, Pharmacometr Inst Pract Educ & Training PIPET, Coll Med, 222 Banpo Daero, Seoul 06591, South Korea
[3] Yonsei Univ, Dept Appl Stat, 50 Yonsei Ro, Seoul 03722, South Korea
基金
新加坡国家研究基金会;
关键词
Imbalanced data; Classification; Oversampling; Undersampling; Hybrid sampling; Ensemble; SMOTE;
D O I
10.1007/s10115-025-02490-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is a common issue in classification tasks, often causing standard classification models to misclassify instances from minority classes. Recent efforts to address this problem frequently involve the combination of sampling techniques with ensemble models. Extending this trend, we propose a new method called R-ROSE Boosting (RRBoost). This method involves the development of a novel synthetic data generation technique, termed radius-random oversampling examples (R-ROSE), and its integration with a boosting-based ensemble method. This approach offers the advantage of enhancing the diversity of synthetic data in the vicinity of hard-to-classify observations, thereby potentially improving classification accuracy. We demonstrate the effectiveness of RRBoost by comparing it with other ensemble models using 24 real imbalanced datasets. As a result, RRBoost proves to be an effective method for addressing imbalanced data, demonstrating superior classification performance compared to other ensemble models.
引用
收藏
页数:23
相关论文
共 36 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]  
[Anonymous], 2007, UCI machine learning repository
[3]   New applications of ensembles of classifiers [J].
Barandela, R ;
Sánchez, JS ;
Valdovinos, RM .
PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (03) :245-256
[4]   Bandwidth selection for kernel conditional density estimation [J].
Bashtannyk, DM ;
Hyndman, RJ .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2001, 36 (03) :279-298
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]   Highly imbalanced fault classification of wind turbines using data resampling and hybrid ensemble method approach [J].
Chatterjee, Subhajit ;
Byun, Yung-Cheol .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[7]  
Chawla NV, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P853, DOI 10.1007/0-387-25465-X_40
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[10]  
Chen Y.C., 2017, Biostatistics & Epidemiology, V1, P161, DOI [10.1080/24709360.2017.1396742, DOI 10.1080/24709360.2017.1396742]