Efficient instance-based learning on data streams

被引:42
作者
Beringer, Juergen [1 ]
Huellermeier, Eyke [2 ]
机构
[1] Univ Magdeburg, Dept Comp Sci, D-39106 Magdeburg, Germany
[2] Univ Marburg, Dept Math & Comp Sci, D-35032 Marburg, Germany
关键词
data streams; classification; instance-based learning; concept drift;
D O I
10.3233/IDA-2007-11604
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The processing of data streams in general and the mining of such streams in particular have recently attracted considerable attention in various research fields. A key problem in stream mining is to extend existing machine learning and data mining methods so as to meet the increased requirements imposed by the data stream scenario, including the ability to analyze incoming data in an online, incremental manner, to observe tight time and memory constraints, and to appropriately respond to changes of the data characteristics and underlying distributions, amongst others. This paper considers the problem of classification on data streams and develops an instance-based learning algorithm for that purpose. The experimental studies presented in the paper suggest that this algorithm has a number of desirable properties that are not, at least not as a whole, shared by currently existing alternatives. Notably, our method is very flexible and thus able to adapt to an evolving environment quickly, a point of utmost importance in the data stream context. At the same time, the algorithm is relatively robust and thus applicable to streams with different characteristics.
引用
收藏
页码:627 / 650
页数:24
相关论文
共 57 条
[1]  
AGGARWAL CC, 2003, P VLDB INT C VER LAR
[2]  
Aha D., 1997, LAZY LEARNING
[3]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[4]  
[Anonymous], 2005, Data Mining Pratical Machine Learning Tools and Techniques
[5]  
[Anonymous], DATA STREAMS MODELS
[6]  
[Anonymous], 2001, P 18 INT C MACH LEAR
[7]  
[Anonymous], 29 INT C VER LARG DA
[8]  
Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
[9]  
BENDAVID S, 2004, P VLDB 04
[10]  
BERCKEN J, 2001, PROC VLDB ENDOW, P39