An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams

被引:60
作者
Hosseini, Mohammad Javad [1 ]
Gholipour, Ameneh [1 ]
Beigy, Hamid [1 ]
机构
[1] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran
关键词
Semi-supervised learning; Ensemble learning; Data streams; Concept drift; Cluster assumption;
D O I
10.1007/s10115-015-0837-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in storage and processing have provided the possibility of automatic gathering of information, which in turn leads to fast and continuous flows of data. The data which are produced and stored in this way are called data streams. Data streams are produced in large size, and much dynamism and have some unique properties which make them applicable to model many real data mining applications. The main challenge of streaming data is the occurrence of concept drift. In addition, regarding the costs of labeling of instances, it is often assumed that only a small fraction of instances are labeled. In this paper, we propose an ensemble algorithm to classify instances of non-stationary data streams in a semi-supervised environment. Furthermore, this method is intended to recognize recurring concept drifts of data streams. In the proposed algorithm, a pool of classifiers is maintained by the algorithm with each classifier being representative of one single concept. At first, a batch of instances is classified by the algorithm. Thereafter, some of these instances are labeled and this partially labeled batch is used to update the classifiers in the pool. This process repeats for consecutive batches of the streams. The main advantage of the algorithm is that it uses unlabeled instances as well as labeled ones in the learning task. Experimental results show the effectiveness of the proposed algorithm over the state-of-the-art methods, in different aspects.
引用
收藏
页码:567 / 597
页数:31
相关论文
共 41 条
[1]  
Ahmadi Z, 2012, LECT NOTES COMPUT SC, V7209, P526
[2]  
[Anonymous], 2002, P 8 ACM SIGKDD INT C
[3]  
[Anonymous], 2006, Data Streams: Models and Algorithms (Advances in Database Systems)
[4]  
[Anonymous], 2005, P 2 INT WORKSHOP KNO
[5]  
[Anonymous], 2008, LEARNING DETECTING C
[6]  
Gomes JB, 2010, LECT NOTES ARTIF INT, V6086, P168, DOI 10.1007/978-3-642-13529-3_19
[7]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[8]  
Bifet Albert, 2007, SIAM INT C DATA MINI
[9]  
Castillo G, 2008, AI COMMUN, V21, P87
[10]  
Chapelle O., 2009, SEMISUPERVISED LEARN, V20, P542