MAFIA: A maximal frequent itemset algorithm

被引:145
作者
Burdick, D
Calimlim, M
Flannick, J
Gehrke, J
Yiu, TM
机构
[1] Univ Wisconsin, Madison, WI 53706 USA
[2] Cornell Univ, Ithaca, NY 14853 USA
[3] Stanford Univ, Janes H Clark Ctr, Stanford, CA 94305 USA
[4] Amazon Com, Seattle, WA 98104 USA
基金
美国国家科学基金会;
关键词
itemset mining; maximal itemsets; transactional databases;
D O I
10.1109/TKDE.2005.183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new algorithm for mining maximal frequent itemsets from a transactional database. The search strategy of the algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms that significantly improve mining performance. Our implementation for support counting combines a vertical bitmap representation of the data with an efficient bitmap compression scheme. In a thorough experimental analysis, we isolate the effects of individual components of MAFIA including search space pruning techniques and adaptive compression. We also compare our performance with previous work by running tests on very different types of data sets. Our experiments show that MAFIA performs best when mining long itemsets and outperforms other algorithms on dense data by a factor of three to 30.
引用
收藏
页码:1490 / 1504
页数:15
相关论文
共 38 条
  • [1] Agarwal R. C., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P108, DOI 10.1145/347090.347114
  • [2] A tree projection algorithm for generation of frequent item sets
    Agarwal, RC
    Aggarwal, CC
    Prasad, VVV
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (03) : 350 - 371
  • [3] AGGARWAL CC, 1998, B IEEE COMP SOC TECH, V21, P23
  • [4] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [5] Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
  • [6] [Anonymous], P 1998 ACM SIGMOD IN
  • [7] [Anonymous], 1996, Advances in Knowledge Discovery and Data Mining, DOI DOI 10.1007/978-3-319-31750-2.
  • [8] [Anonymous], P 2000 INT C VER LAR
  • [9] [Anonymous], P INT C VER LARG DAT
  • [10] Bayardo R.J., 1999, P 5 ACM SIGKDD INT C, P145, DOI [10.1145/312129.312219, DOI 10.1145/312129.312219]