On the Performance of Oversampling Techniques for Class Imbalance Problems

被引:7
作者
Kong, Jiawen [1 ]
Rios, Thiago [2 ]
Kowalczyk, Wojtek [1 ]
Menzel, Stefan [2 ]
Back, Thomas [1 ]
机构
[1] Leiden Univ, Leiden, Netherlands
[2] Honda Res Inst Europe GmbH, Offenbach, Germany
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II | 2020年 / 12085卷
基金
欧盟地平线“2020”;
关键词
Class imbalance; Minority class distribution; Data complexity measures; SAMPLING APPROACH; CLASSIFICATION; ALGORITHMS;
D O I
10.1007/978-3-030-47436-2_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although over 90 oversampling approaches have been developed in the imbalance learning domain, most of the empirical study and application work are still based on the "classical" resampling techniques. In this paper, several experiments on 19 benchmark datasets are set up to study the efficiency of six powerful oversampling approaches, including both "classical" and new ones. According to our experimental results, oversampling techniques that consider the minority class distribution (new ones) perform better in most cases and RACOG gives the best performance among the six reviewed approaches. We further validate our conclusion on our real-world inspired vehicle datasets and also find applying oversampling techniques can improve the performance by around 10%. In addition, seven data complexity measures are considered for the initial purpose of investigating the relationship between data complexity measures and the choice of resampling techniques. Although no obvious relationship can be abstracted in our experiments, we find Flv value, a measure for evaluating the overlap which most researchers ignore, has a strong negative correlation with the potential AUC value (after resampling).
引用
收藏
页码:84 / 96
页数:13
相关论文
共 24 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]   Imbalance: Oversampling algorithms for imbalanced classification in R [J].
Cordon, Ignacio ;
Garcia, Salvador ;
Fernandez, Alberto ;
Herrera, Francisco .
KNOWLEDGE-BASED SYSTEMS, 2018, 161 :329-341
[5]   RACOG and wRACOG: Two Probabilistic Oversampling Techniques [J].
Das, Barnan ;
Krishnan, Narayanan C. ;
Cook, Diane J. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (01) :222-234
[6]  
Fernandez A., 2018, Learning From Imbalanced Data Sets, P63, DOI [DOI 10.1007/978-3-319-98074-4, 10.1007/978-3-319-98074-4_4]
[7]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[8]  
Heft AI, 2012, ASME FLUID ENG DIV, P41
[9]  
Knupp P., 2008, 46 AIAA AER SCI M EX, P933
[10]  
Kong JW, 2019, 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), P3072, DOI 10.1109/SSCI44817.2019.9002679