An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining

被引:21
作者
Chen, Jinlin [1 ]
机构
[1] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
关键词
Data mining algorithm; directed acyclic graph; performance analysis; sequential pattern; transaction database; PREFIXSPAN;
D O I
10.1109/TKDE.2009.135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional pattern growth-based approaches for sequential pattern mining derive length-(k + 1) patterns based on the projected databases of length-k patterns recursively. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in left perpendicular log(2)k + 1 right perpendicular levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. It often outperforms PrefixSpan by almost one order of magnitude in scalability tests. UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension toward applications involving searching in large spaces.
引用
收藏
页码:913 / 928
页数:16
相关论文
共 22 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[3]  
[Anonymous], P WORKSH FREQ IT MIN
[4]  
Antunes C, 2003, LECT NOTES ARTIF INT, V2734, P239
[5]  
Ayres J., 2002, P ACM SIGKDD INT C K, P429
[6]  
Berkovich S, 2000, SOFTWARE PRACT EXPER, V30, P1531, DOI 10.1002/1097-024X(20001125)30:14<1531::AID-SPE347>3.0.CO
[7]  
2-A
[8]  
CHEN J, 2007, P WORLD WID WEB C WW
[9]  
CHEN J, ACM T KNOWL IN PRESS
[10]  
Garofalakis MN, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P223