Mining in anticipation for concept change: Proactive-reactive prediction in data streams

被引:64
作者
Yang, Ying [1 ]
Wu, Xindong
Zhu, Xingquan
机构
[1] Monash Univ, Sch Comp Sci & Software Engn, Melbourne, Vic 3800, Australia
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
data stream; concept change; classification; proactive learning; reactive learning; conceptual equivalence;
D O I
10.1007/s10618-006-0050-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prediction in streaming data is an important activity in the modern society. Two major challenges posed by data streams are (1) the data may grow without limit so that it is difficult to retain a long history of raw data; and (2) the underlying concept of the data may change over time. The novelties of this paper are in four folds. First, it uses a measure of conceptual equivalence to organize the data history into a history of concepts. This contrasts to the common practice that only keeps recent raw data. The concept history is compact while still retains essential information for learning. Second, it learns concept-transition patterns from the concept history and anticipates what the concept will be in the case of a concept change. It then proactively prepares a prediction model for the future change. This contrasts to the conventional methodology that passively waits until the change happens. Third, it incorporates proactive and reactive predictions. If the anticipation turns out to be correct, a proper prediction model can be launched instantly upon the concept change. If not, it promptly resorts to a reactive mode: adapting a prediction model to the new data. Finally, an efficient and effective system RePro is proposed to implement these new ideas. It carries out prediction at two levels, a general level of predicting each oncoming concept and a specific level of predicting each instance's class. Experiments are conducted to compare RePro with representative existing prediction methods on various benchmark data sets that represent diversified scenarios of concept change. Empirical evidence offers inspiring insights and demonstrates the proposed methodology is an advisable solution to prediction in data streams.
引用
收藏
页码:261 / 289
页数:29
相关论文
共 18 条
[1]  
Aggarwal C.C., 2003, P 29 INT C VER LARG, P81, DOI DOI 10.1016/B978-012722442-8/50016-1
[2]  
[Anonymous], 2003, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min
[3]  
BLAKE CL, 2005, UCI REPOSITORY MACHI
[4]   DEMON: Mining and monitoring evolving data [J].
Ganti, V ;
Gehrke, J ;
Ramakrishnan, R .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (01) :50-63
[5]  
Gehrke J, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P169, DOI 10.1145/304181.304197
[6]  
HARRIES MB, 1996, PRICAI WORKSH, P106
[7]  
Hulten G, 2001, P 7 ACM SIGKDD INT C, P97, DOI DOI 10.1145/502512.502529
[8]  
Jain R., 1991, ART COMPUTER SYSTEMS
[9]  
Keogh E., 2002, P 8 ACM SIGKDD INT C, P102
[10]  
Kolter JZ, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P123