BIDE: Efficient mining of frequent closed sequences

被引:284
作者
Wang, JY [1 ]
Han, JW [1 ]
机构
[1] Univ Minnesota Twin Cities, Digital Technol Ctr, Minneapolis, MN 55455 USA
来源
20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ICDE.2004.1319986
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper we present, BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called BI-Directional Extension, and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method and the Scan Skip optimization technique. A thorough performance study with both sparse and dense real-life data sets has demonstrated that BIDE significantly outperforms the previous algorithms: it consumes order(s) of magnitude less memory and can be more than an order of magnitude faster It is also linearly scalable in terms of database size.
引用
收藏
页码:79 / 90
页数:12
相关论文
共 25 条
  • [1] AGRAWAL R, 1994, VLDB 94 SANT CHIL SE
  • [2] AGRAWAL R, 1995, ICDE 95 TAIP TAIW MA
  • [3] ALOY P, 2002, J MOL BIOL, P311
  • [4] AYRES J, 2002, SIGKDD 02 EDM CAN JU
  • [5] Bettini C., 1998, B TECH COMMITTEE DAT, V21, P32
  • [6] GAROFALAKIS M, 1998, VLDB 99 SAN FRANC CA
  • [7] HAN J, 1999, ICDE 99 SYDN AUSTR M
  • [8] HAN J, 2002, ICDM 02 MAEB JAP DEC
  • [9] HAN J, 2000, SIGKDD 00 BOST MA AU
  • [10] JONASSEN I, 1995, PROTEIN SCI, V4