GOOWE: Geometrically Optimum and Online-Weighted Ensemble Classifier for Evolving Data Streams

被引:30
作者
Bonab, Hamed R. [1 ]
Can, Fazli [2 ]
机构
[1] Bilkent Univ, Ankara, Turkey
[2] Bilkent Univ, Bilkent Informat Retrieval Grp, Comp Engn Dept, TR-06800 Ankara, Turkey
关键词
Ensemble classifier; concept drift; evolving data stream; dynamic weighting; geometry of voting; least squares; spatial modeling for online ensembles; MAJORITY;
D O I
10.1145/3139240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Designing adaptive classifiers for an evolving data stream is a challenging task due to the data size and its dynamically changing nature. Combining individual classifiers in an online setting, the ensemble approach, is a well-known solution. It is possible that a subset of classifiers in the ensemble outperforms others in a time-varying fashion. However, optimum weight assignment for component classifiers is a problem, which is not yet fully addressed in online evolving environments. We propose a novel data stream ensemble classifier, called Geometrically Optimum and Online-Weighted Ensemble (GOOWE), which assigns optimum weights to the component classifiers using a sliding window containing the most recent data instances. We map vote scores of individual classifiers and true class labels into a spatial environment. Based on the Euclidean distance between vote scores and ideal-points, and using the linear least squares (LSQ) solution, we present a novel, dynamic, and online weighting approach. While LSQ is used for batch mode ensemble classifiers, it is the first time that we adapt and use it for online environments by providing a spatial modeling of online ensembles. In order to show the robustness of the proposed algorithm, we use real-world datasets and synthetic data generators using the Massive Online Analysis (MOA) libraries. First, we analyze the impact of our weighting system on prediction accuracy through two scenarios. Second, we compare GOOWE with eight state-of-theart ensemble classifiers in a comprehensive experimental environment. Our experiments show that GOOWE provides improved reactions to different types of concept drift compared to our baselines. The statistical tests indicate a significant improvement in accuracy, with conservative time and memory requirements.
引用
收藏
页数:33
相关论文
共 54 条
[1]  
[Anonymous], Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003
[2]  
[Anonymous], 1999, The analysis of variance
[3]  
[Anonymous], 2001, THESIS
[4]  
Bifet A, 2010, LECT NOTES ARTIF INT, V6321, P135, DOI 10.1007/978-3-642-15880-3_15
[5]  
Bifet A, 2010, J MACH LEARN RES, V11, P1601
[6]  
Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139
[7]  
Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
[8]   A Theoretical Framework on the Ideal Number of Classifiers for Online Ensembles in Data Streams [J].
Bonab, Hamed R. ;
Can, Fazli .
CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, :2053-2056
[9]   Combining block-based and online methods in learning ensembles from concept drifting data streams [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
INFORMATION SCIENCES, 2014, 265 :50-67
[10]   Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm [J].
Brzezinski, Dariusz ;
Stefanowski, Jerzy .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (01) :81-94