SeqStream: Mining Closed Sequential Patterns over Stream Sliding Windows

被引:30
作者
Chang, Lei [1 ,2 ,3 ]
Wang, Tengjiao [1 ,2 ]
Yang, Dongqing [1 ,2 ]
Luan, Hua [4 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China
[2] Peking Univ, Min Educ, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[3] EMC Res China, Hong Hom, Hong Kong, Peoples R China
[4] Renmin Univ China, Sch Informat, Haidian, Peoples R China
来源
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2008年
基金
国家高技术研究发展计划(863计划);
关键词
D O I
10.1109/ICDM.2008.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies have shown mining closed patterns provides more benefits than mining the complete set of frequent patterns, since closed pattern mining leads to more compact results and more efficient algorithms. It is quite useful in a data stream environment where memory and computation power are major concerns. This paper studies the problem of mining closed sequential patterns over data stream sliding windows. A synopsis structure IST (Inverse Closed Sequence Tree) is designed to keep inverse closed sequential patterns in current window An efficient algorithm SeqStream is developed to mine closed sequential patterns in stream windows incrementally, and various novel strategies are adopted in SeqStream to prune search space aggressively. Extensive experiments on both real and synthetic data sets show that SeqStream outperforms PrefixSpan, CloSpan and BIDE by a factor of about one to two orders of magnitude.
引用
收藏
页码:83 / +
页数:2
相关论文
共 23 条
[1]  
AGRAWAL R, 1996, MINING SEQUENTIAL PA
[2]  
AGRAWAL R, 1995, SEQUENTIAL PATTERN M
[3]  
Ayres J., 2002, SEQUENTIAL PATTERN M
[4]  
Babcock B., 2002, MODELS ISSUES DATA S
[5]  
BABCOCK B, 2003, MAINTAINING VARIANCE
[6]  
Chang J.H., 2005, J INFORM SCI, V31
[7]  
CHEN G, 2005, SEQUENTIAL PATTERN M
[8]  
CHENG H, 2004, INCSPAN INCREMENTAL
[9]  
EZEIFE C, 2007, SSM FREQUENT SEQUENT
[10]  
Gaber M., 2005, SIGMOD RECORD, V34