Data streams classification with ensemble model based on decision-feedback

被引:0
作者
LIU Jing [1 ,2 ]
XU Guo-sheng [1 ,2 ]
ZHENG Shi-hui [1 ,2 ]
XIAO Da [1 ,2 ]
GU Li-ze [1 ,2 ]
机构
[1] Information Security Center,Beijing University of Posts and Telecommunications
[2] National Engineering Laboratory for Disaster Backup and Recovery,Beijing University of Posts and Telecommunications
基金
中国国家自然科学基金; 中央高校基本科研业务费专项资金资助;
关键词
ensemble classification; novel class; concept drifting; decision-feedback;
D O I
暂无
中图分类号
TP315 [管理程序、管理系统];
学科分类号
1201 ;
摘要
The main challenges of data streams classification include infinite length, concept-drifting, arrival of novel classes and lack of labeled instances. Most existing techniques address only some of them and ignore others. So an ensemble classification model based on decision-feedback(ECM-BDF) is presented in this paper to address all these challenges. Firstly, a data stream is divided into sequential chunks and a classification model is trained from each labeled data chunk. To address the infinite length and concept-drifting problem, a fixed number of such models constitute an ensemble model E and subsequent labeled chunks are used to update E. To deal with the appearance of novel classes and limited labeled instances problem, the model incorporates a novel class detection mechanism to detect the arrival of a novel class without training E with labeled instances of that class. Meanwhile, unsupervised models are trained from unlabeled instances to provide useful constraints for E. An extended ensemble model Ex can be acquired with the constraints as feedback information, and then unlabeled instances can be classified more accurately by satisfying the maximum consensus of Ex. Experimental results demonstrate that the proposed ECM-BDF outperforms traditional techniques in classifying data streams with limited labeled data.
引用
收藏
页码:79 / 85
页数:7
相关论文
共 1 条
[1]  
Using Additive Expert Ensembles to Cope with Concept Drift .2 Jeremy Z. Kolter,Marcus A. Maloof. Proceedings of the 22nd International Conference on Machine Learning . 2005