A scalable algorithm for mining maximal frequent sequences using a sample

被引:0
作者
Congnan Luo
Soon M. Chung
机构
[1] Wright State University,Department of Computer Science and Engineering
来源
Knowledge and Information Systems | 2008年 / 15卷
关键词
Data mining; Maximal frequent sequences; Sampling; Signatures; Prefix tree; Performance analysis;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we propose an efficient scalable algorithm for mining Maximal Sequential Patterns using Sampling (MSPS). The MSPS algorithm reduces much more search space than other algorithms because both the subsequence infrequency-based pruning and the supersequence frequency-based pruning are applied. In MSPS, a sampling technique is used to identify long frequent sequences earlier, instead of enumerating all their subsequences. We propose how to adjust the user-specified minimum support level for mining a sample of the database to achieve better overall performance. This method makes sampling more efficient when the minimum support is small. A signature-based method and a hash-based method are developed for the subsequence infrequency-based pruning when the seed set of frequent sequences for the candidate generation is too big to be loaded into memory. A prefix tree structure is developed to count the candidate sequences of different sizes during the database scanning, and it also facilitates the customer sequence trimming. Our experiments showed MSPS has very good performance and better scalability than other algorithms.
引用
收藏
页码:149 / 179
页数:30
相关论文
共 7 条
[1]  
Domingo C(2002)Adaptive sampling methods for scaling up knowledge discovery algorithms Data Min Knowl Discov 6 131-152
[2]  
Gavalda R(1997)Using a hash-based method with transaction trimming for mining association rules IEEE Trans Knowl Data Eng 9 813-825
[3]  
Watanabe O(2001)SPADE: an efficient algorithm for mining frequent sequences Mach Learn 42 31-60
[4]  
Park JS(undefined)undefined undefined undefined undefined-undefined
[5]  
Chen MS(undefined)undefined undefined undefined undefined-undefined
[6]  
Yu PS(undefined)undefined undefined undefined undefined-undefined
[7]  
Zaki MJ(undefined)undefined undefined undefined undefined-undefined