An FP-tree approach for mining N-most interesting itemsets

被引:1
作者
Cheung, YL [1 ]
Fu, AWC [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
来源
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV | 2002年 / 4730卷
关键词
association rules mining; itemsets; FP-tree; N-most interesting;
D O I
10.1117/12.460253
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. If the threshold is set too high, nothing will be discovered; and if it is set too low, too many itemsets will be generated, which also implies inefficiency. In this paper, we handle a more practical problem, roughly speaking, it is to mine the N k-itemsets with the highest support for k up to a certain k(max) value. We call the results the N-most interesting itemsets. Generally, it is more straightforward for users to determine N and k(max). This approach also provides a solution for an open issue in the problem of subspace clustering. However, with the above problem definition without the support threshold, the subset closure property of the apriori-gen algorithm no longer holds. We propose three new algorithms, LOOPBACK, BOLB, and BOMO, for mining N-most interesting itemsets by variations of the FP-tree approach. Experiments show that all our methods outperform the previously proposed Itemset-Loop algorithm.
引用
收藏
页码:460 / 471
页数:12
相关论文
共 18 条
[1]  
AGGARWAL CC, 1998, B IEEE COMPUTER SOC, P23
[2]  
AGRAWAL R, 1998, P INT C MAN DAT SIGM
[3]  
Agrawal R., 1994, P 20 INT C VER LARG, V1215, P487
[4]  
[Anonymous], P PYOC ACM SIGMOD IN
[5]  
[Anonymous], P INT C VER LARG DAT
[6]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[7]  
CENTER IAR, SYNTHETIC DATA GENER
[8]  
CENTER MP, IPUMS 98
[9]  
Cheng C.-H., 1999, Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD'99), P84, DOI [10.1145/312129.312199, DOI 10.1145/312129.312199]
[10]  
FU AWC, 2000, P INT S METH INT SYS