CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA

被引:0
作者
Hu, Xiao-Sheng [1 ]
Zhang, Run-Jing [2 ]
机构
[1] Foshan Univ, Coll Elect & Informat Engn, Foshan 528000, Peoples R China
[2] Foshan Univ, Informat & Educ Technol Ctr, Foshan 528000, Peoples R China
来源
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年
关键词
Imbalanced data; Classification; Clustering; Ensemble learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent research, classification involving imbalanced datasets has received considerable attention. Most classification algorithms tend to predict that most of the incoming data belongs to the majority class, resulting in the poor classification performance in minority class instances, which are usually of much more interest. In this paper we propose a clustering-based subset ensemble learning method for handling class imbalanced problem. In the proposed approach, first, new balanced training datasets are produced using clustering-based under-sampling, then, further classification of new training sets are performed by applying four algorithms: Decision Tree, Naive Bayes, KNN and SVM, as the base algorithms in combined-bagging. An experimental analysis is carried out over a wide range of highly imbalanced data sets. The results obtained show that our method can improve imbalance classification performance of rare and normal classes stably and effectively.
引用
收藏
页码:35 / 39
页数:5
相关论文
共 11 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
[Anonymous], 1999, Proceedings of the International Joint Conference on Artificial Intelligence
[3]  
Batista G. E., 2004, ACM SIGKDD Explor. Newslett., P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[4]   On the effectiveness of preprocessing methods when dealing with different levels of class imbalance [J].
Garcia, V. ;
Sanchez, J. S. ;
Mollineda, R. A. .
KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) :13-21
[5]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[6]  
Jie Song, 2009, Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009), P109, DOI 10.1109/FSKD.2009.608
[7]  
Juszczak Piotr., 2003, Proceedings of the 1CML-2003 Workshop, P81
[8]  
Liu XY, 2006, IEEE DATA MINING, P965
[9]   A study in machine learning from imbalanced data for sentence boundary detection in speech [J].
Liu, Yang ;
Chawla, Nitesh V. ;
Harper, Mary R. ;
Shriberg, Elizabeth ;
Stolcke, Andreas .
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (04) :468-494
[10]  
Raskutti B., 2004, Sigkdd Explorations, V6, P60, DOI 10.1145/1007730.1007739