A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

被引:5
作者
Song, Ge [1 ]
Ye, Yunming [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen Key Lab Internet Informat Collaborat, Shenzhen 518055, Peoples R China
关键词
CLASSIFICATION;
D O I
10.1155/2014/497354
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.
引用
收藏
页数:11
相关论文
共 23 条
[1]  
[Anonymous], 2005, P 2 INT WORKSHOP KNO
[2]  
Bifet A., 2009, DATA STREAM MINING P
[3]  
Bifet A, 2010, JMLR WORKSH CONF PRO, V11, P44
[4]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15
[5]  
Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139
[6]  
Bo Liu, 2010, Proceedings 2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010), P951, DOI 10.1109/ICDMW.2010.201
[7]  
Brzezinski D, 2011, LECT NOTES ARTIF INT, V6679, P155, DOI 10.1007/978-3-642-21222-2_19
[8]  
Crouch R., 2002, P LREC PARSEVAL WORK, P67
[9]  
Delany S.J., 2010, PROC 23 INT FLORIDA, P32
[10]  
Ditzler Gregory, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P2997, DOI 10.1109/ICPR.2010.734