Active mining of data streams

被引:0
作者
Fan, W [1 ]
Huang, YA [1 ]
Wang, HX [1 ]
Yu, PS [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Hawthorne, NY 10532 USA
来源
PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2004年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most previously proposed mining methods on data streams make an unrealistic assumption that "labelled" data stream is readily available and can be mined at anytime. However, in most real-world problems, labelled data streams are rarely immediately available. Due to this reason, models are refreshed periodically, that is usually synchronized with data availability schedule. There are several undesirable consequences of this "passive periodic refresh". In this paper, we propose a new concept of demand-driven active data mining. It estimates the error of the model on the new data stream without knowing the true class labels. When significantly higher error is suspected, it investigates the true class labels of a selected number of examples in the most recent data stream to verify the suspected higher error.
引用
收藏
页码:457 / 461
页数:5
相关论文
共 6 条
  • [1] [Anonymous], 2003, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min
  • [2] BABCOCK B, 2002, ACM S PRINC DAT SYST
  • [3] CHEN Y, 2002, P VER LARG DAT VLDB
  • [4] Domingos P., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P71, DOI 10.1145/347090.347107
  • [5] GAO L, 2002, INT C MAN DAT SIGMOD
  • [6] Clustering data streams
    Guha, S
    Mishra, N
    Motwani, R
    O'Callaghan, L
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 359 - 366