KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling

被引:16
作者
Ding, Hao [1 ]
Wei, Bin [2 ]
Gu, Zhaorui [1 ]
Yu, Zhibin [1 ]
Zheng, Haiyong [1 ,3 ]
Zheng, Bing [1 ]
Li, Juan [4 ]
机构
[1] Ocean Univ China, Coll Informat Sci & Engn, Dept Elect Engn, Qingdao 266100, Peoples R China
[2] Qingdao Univ, Shandong Key Lab Digital Med & Comp Assisted Surg, Affiliated Hosp, Qingdao 266003, Peoples R China
[3] Univ Dundee, Sch Sci & Engn, Dept Math, Dundee DD1 4HN, Scotland
[4] Qingdao Agr Univ, Coll Mech & Elect Engn, Qingdao 266109, Peoples R China
基金
中国国家自然科学基金;
关键词
Class-imbalance learning; Under-sampling; Over-sampling; Ensemble learning; Image classification; REGRESSION; PREDICTION; SMOTE;
D O I
10.1007/s11042-019-07856-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced learning has become a research emphasis in recent years because of the growing number of class-imbalance classification problems in real applications. It is particularly challenging when the imbalanced rate is very high. Sampling, including under-sampling and over-sampling, is an intuitive and popular way in dealing with class-imbalance problems, which tries to regroup the original dataset and is also proved to be efficient. The main deficiency is that under-sampling methods usually ignore many majority class examples while over-sampling methods may easily cause over-fitting problem. In this paper, we propose a new algorithm dubbed KA-Ensemble ensembling under-sampling and over-sampling to overcome this issue. Our KA-Ensemble explores EasyEnsemble framework by under-sampling the majority class randomly and over-sampling the minority class via kernel based adaptive synthetic (Kernel-ADASYN) at meanwhile, yielding a group of balanced datasets to train corresponding classifiers separately, and the final result will be voted by all these trained classifiers. Through combining under-sampling and over-sampling in this way, KA-Ensemble is good at solving class-imbalance problems with large imbalanced rate. We evaluated our proposed method with state-of-the-art sampling methods on 9 image classification datasets with different imbalanced rates ranging from less than 2 to more than 15, and the experimental results show that our KA-Ensemble performs better in terms of accuracy (ACC), F-Measure, G-Mean, and area under curve (AUC). Moreover, it can be used in both dichotomy and multi-classification on both image classification and other class-imbalance problems.
引用
收藏
页码:14871 / 14888
页数:18
相关论文
共 50 条
[1]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[2]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[3]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[4]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[5]  
Drummond C, 2003, WORKSH LEARN IMB DAT, V11, P1
[6]   A multiple resampling method for learning from imbalanced data sets [J].
Estabrooks, A ;
Jo, TH ;
Japkowicz, N .
COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) :18-36
[7]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[8]  
Fanny, 2018, Procedia Computer Science, V135, P60, DOI 10.1016/j.procs.2018.08.150
[9]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[10]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1