An efficient algorithm for mining condensed sequential pattern bases

被引:1
作者
Wang, Tao [1 ]
机构
[1] Hubei Univ Econ, Coll Comp Sci & Technol, Wuhan, Peoples R China
关键词
Data management; Data mining; Programming and algorithm theory;
D O I
10.1108/03684921211275315
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Purpose - Mining sequential patterns in large databases has become an important data mining task with broad applications, such as business analysis, web mining, security, and bio-sequences analysis. The purpose of this paper is to propose the notion of condensed frequent sequential pattern base (SP base) with guaranteed maximal error bound. Design/methodology/approach - A subset of frequent sequential patterns is computed, and then used to approximate the supports of arbitrary frequent sequential patterns with guaranteed maximal error bound, because in many applications it is sufficient to generate only frequent sequential patterns with support frequency in close-enough approximation instead of in full precision. Findings - The concept of condensed frequent SP base is introduced, and an efficient algorithm for mining condensed SP bases is developed. Research limitations/implications - A condensed frequent SP base can significantly reduce the set of sequential patterns that need to be mined, stored, and analyzed, while providing guaranteed error bound for frequencies of sequential patterns not in the base. Practical implications - A much smaller base of patterns can help users to comprehend the mining results. Computing a much smaller pattern base also leads to better efficiency. Originality/value - The paper shows that by adopting a novel pruning technology, the algorithm out-performs the previous work by one order of magnitude.
引用
收藏
页码:1289 / 1296
页数:8
相关论文
共 13 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Ayres J., 2002, P ACM SIGKDD INT C K, P429
[3]  
Jiawei Han, 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P355
[4]  
Li X., 2006, ADV SYSTEMS SCI APPL, V6, P675
[5]  
Pei J, 2001, PROC INT CONF DATA, P215
[6]  
Pei J., 2002, P INT C DAT MIN MAEB
[7]  
Srikant R., 1996, Advances in Database Technology - EDBT '96. 5th International Conference on Extending Database Technology. Proceedings, P3
[8]  
Sun BL, 2006, LECT NOTES COMPUT SC, V4223, P735
[9]  
Sun BL, 2006, LECT NOTES COMPUT SC, V3947, P217
[10]   A QoS multicast routing optimization algorithm based on genetic algorithm [J].
Sun, BL ;
Li, LY .
JOURNAL OF COMMUNICATIONS AND NETWORKS, 2006, 8 (01) :116-122