An efficient algorithm for mining frequent sequences by a new strategy without support counting

被引：49

作者：

Chiu, DY ^{[1
]}

Wu, YH ^{[1
]}

Chen, ALP ^{[1
]}

机构：

[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu 300, Taiwan

来源：

20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/ICDE.2004.1320012

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. In this paper, we propose a new strategy called DIrect Sequence Comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of non-frequent sequences. The main difference between the DISC strategy and the Previous works is the way to prime non-frequent sequences. The previous works are based on the anti-monotone property, which prune the non-frequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy primes the non-frequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.

引用

页码：375 / 386

页数：12

共 18 条

[1]

AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415

[2]

[Anonymous], 1996, EDBT, DOI 10.1007/BFb0014140

[3]

[Anonymous], 2000, P 6 ACM SIGKDD INT C

[4]

[Anonymous], 1999, P 5 ACM SIGKDD INT C, DOI DOI 10.1145/312129.312275

[5]

Ayres Jay, 2002, Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining, P429, DOI 10.1145/775047.775109

[6] ZTR: a new format for DNA sequence trace data [J].

Bonfield, JK ;

Staden, R .

BIOINFORMATICS, 2002, 18 (01) :3-10

[7]

CHIU DY, 2003, EFFICIENT ALGORITHM

[8] Mining sequential patterns with regular expression constraints [J].

Garofalakis, M ;

Rastogi, R ;

Shim, K .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (03) :530-552

[9] Discovering nontrivial repeating patterns in music data [J].

Hsu, JL ;

Liu, CC ;

Chen, ALP .

IEEE TRANSACTIONS ON MULTIMEDIA, 2001, 3 (03) :311-325

[10]

Pei J, 2001, PROC INT CONF DATA, P215

← 1 2 →