TFP: An efficient algorithm for mining top-K frequent closed itemsets

被引：125

作者：

Wang, JY ^{[1
]}

Han, JW

Lu, Y

Tzvetkov, P

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[2] Univ Illinois, Dept Comp Sci, Siebel Ctr Sci 2132, Urbana, IL 61801 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2005年 / 17卷 / 05期

基金：

美国国家科学基金会;

关键词：

data mining; frequent itemset; association rules; mining methods and algorithms;

D O I：

10.1109/TKDE.2005.81

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.

引用

页码：652 / 664

页数：13

共 27 条

[1] Ada Wai-Chee Fu, 2000, Foundations of Intelligent Systems. 12th International Symposium, ISMIS 2000. Proceedings (Lecture Notes in Artificial Intelligence Vol.1932), P59
[2] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[3] Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[4] [Anonymous], P 1998 ACM SIGMOD IN
[5] [Anonymous], P ACM SIGMOD 98
[6] Bastide Y., 2000, ACM SIGKDD Explor. Newsl., V2, P66, DOI DOI 10.1145/380995.381017
[7] BAY SD, 1999, P 5 ACM SIGKDD INT C, P302
[8] BONCHI F, 2003, P 7 EUR C PRINC PRAC
[9] Boulicaut JF, 2000, LECT NOTES ARTIF INT, V1805, P62
[10] Free-sets: A condensed representation of Boolean data for the approximation of frequency queries
Boulicaut, JF
Bykowski, A
Rigotti, C
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) : 5 - 22

← 1 2 3 →