Multi-window based ensemble learning for classification of imbalanced streaming data

被引:1
作者
Hu Li
Ye Wang
Hua Wang
Bin Zhou
机构
[1] National University of Defense Technology,College of Computer
[2] Victoria University,Centre for Applied Informatics
[3] National University of Defense Technology,State Key Laboratory of High Performance Computing
来源
World Wide Web | 2017年 / 20卷
关键词
Streaming data; Class imbalance; Multi-window; Ensemble learning;
D O I
暂无
中图分类号
学科分类号
摘要
Imbalanced streaming data is commonly encountered in real-world data mining and machine learning applications, and has attracted much attention in recent years. Both imbalanced data and streaming data in practice are normally encountered together; however, little research work has been studied on the two types of data together. In this paper, we propose a multi-window based ensemble learning method for the classification of imbalanced streaming data. Three types of windows are defined to store the current batch of instances, the latest minority instances, and the ensemble classifier. The ensemble classifier consists of a set of latest sub-classifiers, and the instances employed to train each sub-classifier. All sub-classifiers are weighted prior to predicting the class labels of newly arriving instances, and new sub-classifiers are trained only when the precision is below a predefined threshold. Extensive experiments on synthetic datasets and real-world datasets demonstrate that the new approach can efficiently and effectively classify imbalanced streaming data, and generally outperforms existing approaches.
引用
收藏
页码:1507 / 1525
页数:18
相关论文
共 31 条
[1]  
Chawla NV(2002)SMOTE: synthetic minority over-sampling technique J. Artif. Intell. Res. 16 321-357
[2]  
Bowyer KW(2010)Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach Evol. Syst. 2 35-50
[3]  
Hall LO(2011)Incremental learning of concept drift in nonstationary environments IEEE Trans. Neural Netw. 22 1517-1531
[4]  
Kegelmeyer WP(2009)Learning from imbalanced data IEEE Trans. Knowl. Data Eng. 21 1263-1284
[5]  
Chen S(2012)Learning from streaming data with concept drift and imbalance: an overview Prog. Artif. Intell. 1 89-101
[6]  
He H(2014)Simple-random-sampling-based multiclass text classification algorithm Sci. World J. 2014 1-7
[7]  
Elwell R(2004)In defense of one-vs-all classification J. Mach. Learn. Res. 5 101-141
[8]  
Polikar R(2010)Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples Comput. Biol. Med. 40 723-732
[9]  
He H(1976)Two modifications of CNN IEEE Trans. Syst. Man Cybern. 6 769-772
[10]  
Garcia EA(2013)Improving text categorization with semantic knowledge in wikipedia IEICE Trans. Inf. Syst. E96-D 2786-2794