Mining maximal frequent itemsets from data streams

被引:27
作者
Mao, Guojun [1 ]
Wu, Xindong
Zhu, Xingquan
Chen, Gong
Liu, Chunnian
机构
[1] Beijing Univ Technol, Sch Comp Sci, Beijing 100022, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
关键词
data mining; data stream; frequent itemset; set of itemsets;
D O I
10.1177/0165551506068179
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent pattern mining from data streams is an active research topic in data mining. Existing research efforts often rely on a two-phase framework to discover frequent patterns: (1) using internal data structures to store meta-patterns obtained by scanning the stream data; and (2) re-mining the meta-patterns to finalize and output frequent patterns. The defectiveness of such a two-phase framework lies in the fact that the two stages provide barriers to dynamically and immediately finding frequent patterns with online functionalities. It is expected that a single-phase algorithm can fulfil frequent pattern mining from data streams in such a way that the users can see patterns in an immediate and dynamic manner, as soon as the patterns have become frequent. In this paper, we propose INSTANT, a single-phase algorithm for discovering frequent itemsets from data streams. The theoretical foundation of INSTANT is based on a framework theory on a:set of itemsets, which is also presented in the paper. The novel design of INSTANT ensures that it employs compact data structures to mine frequent patterns from data streams in a single phase. Our experimental results demonstrate the time and space efficiency of the proposed algorithm.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 17 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., 1994, Proceedings of the 20th International Conference on Very Large Data Bases. VLDB'94, P487
[3]   Online algorithms for mining semi-structured data stream [J].
Asai, T ;
Arimura, H ;
Abe, K ;
Kawasoe, S ;
Arikawa, S .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :27-34
[4]  
Babcock B., 2002, Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), P1, DOI DOI 10.1145/543613.543615
[5]  
CHANG J, 2003, P 9 ACM SIGKDD INT C, P226
[6]   Moment: Maintaining closed frequent itemsets over a stream sliding window [J].
Chi, Y ;
Wang, HX ;
Yu, PS ;
Muntz, RR .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :59-66
[7]  
DONG G, 2003, P 2003 WORKSH MAN PR
[8]  
Giannella C, 2003, TR587 IND U
[9]  
HAN J, 2000, P 2000 ACM SIGMOD IN, P1, DOI DOI 10.1145/342009.335372
[10]  
Li H.F., 2004, P 1 INT WORKSH KNOWL, P20