TFP: An efficient algorithm for mining top-K frequent closed itemsets

被引:125
作者
Wang, JY [1 ]
Han, JW
Lu, Y
Tzvetkov, P
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Siebel Ctr Sci 2132, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
data mining; frequent itemset; association rules; mining methods and algorithms;
D O I
10.1109/TKDE.2005.81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.
引用
收藏
页码:652 / 664
页数:13
相关论文
共 27 条
  • [1] Ada Wai-Chee Fu, 2000, Foundations of Intelligent Systems. 12th International Symposium, ISMIS 2000. Proceedings (Lecture Notes in Artificial Intelligence Vol.1932), P59
  • [2] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [3] Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
  • [4] [Anonymous], P 1998 ACM SIGMOD IN
  • [5] [Anonymous], P ACM SIGMOD 98
  • [6] Bastide Y., 2000, ACM SIGKDD Explor. Newsl., V2, P66, DOI DOI 10.1145/380995.381017
  • [7] BAY SD, 1999, P 5 ACM SIGKDD INT C, P302
  • [8] BONCHI F, 2003, P 7 EUR C PRINC PRAC
  • [9] Boulicaut JF, 2000, LECT NOTES ARTIF INT, V1805, P62
  • [10] Free-sets: A condensed representation of Boolean data for the approximation of frequency queries
    Boulicaut, JF
    Bykowski, A
    Rigotti, C
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) : 5 - 22