PRISM: An effective approach for frequent sequence mining via prime-block encoding

被引:44
作者
Gouda, Karam [2 ]
Hassaan, Mosab [2 ]
Zaki, Mohammed J. [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
[2] Fac Sci, Dept Math, Banha, Egypt
基金
美国国家科学基金会;
关键词
Frequent sequence mining; Prime encoding; Data mining; DISCOVERY; ALGORITHM;
D O I
10.1016/j.jcss.2009.05.008
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called PRISM. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that PRISM outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60], PrefixSpan [J. Pei, J. Han, B. Mortazavi-Asl. H. Pinto, Q. Chen, U. Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:88 / 102
页数:15
相关论文
共 14 条
[1]  
AGRAWAL R, 1995, 11 IEEE INT C DAT EN
[2]  
[Anonymous], 1996, 5 INT C EXT DAT TECH
[3]  
[Anonymous], 1997, ACM SIGACT NEWS
[4]  
AYRES J, 2002, SIGKDD INT C KNOWL D
[5]  
BALCAZAR JL, 2005, 10 INT C DAT THEOR
[6]  
GILBERT JL, 1995, ELEMENTS MODERN ALGE
[7]  
GOUDA K, 2007, IEEE INT C DAT MIN O
[8]   A generic motif discovery algorithm for sequential data [J].
Jensen, KL ;
Styczynski, MP ;
Rigoutsos, I ;
Stephanopoulos, GN .
BIOINFORMATICS, 2006, 22 (01) :21-28
[9]   Discovery of frequent episodes in event sequences [J].
Mannila, H ;
Toivonen, H ;
Verkamo, AI .
DATA MINING AND KNOWLEDGE DISCOVERY, 1997, 1 (03) :259-289
[10]  
MASSEGLIA F, 1998, EUR C PRINC DAT MIN