Effective elimination of redundant association rules

被引:16
作者
Cheng, James [1 ]
Ke, Yiping [1 ]
Ng, Wilfred [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
association rules; redundancy elimination; concise representation;
D O I
10.1007/s10618-007-0084-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well-recognized that the main factor that hinders the applications of Association Rules (ARs) is the huge number of ARs returned by the mining process. In this paper, we propose an effective solution that presents concise mining results by eliminating the redundancy in the set of ARs. We adopt the concept of delta tolerance to define the set of delta-Tolerance ARs (delta-TARs), which is a concise representation for the set of ARs. The notion of delta-tolerance is a relaxation on the closure defined on the support of frequent itemsets, thus allowing us to effectively prune the redundant ARs. We devise a set of inference rules, with which we prove that the set of delta-TARs is a non-redundant representation of ARs. In addition, we prove that the set of ARs that is derived from the delta-TARs by the inference rules is sound and complete. We also develop a compact tree structure called the delta-TAR tree, which facilitates the efficient generation of the delta-TARs and derivation of other ARs. Experimental results verify the efficiency of using the delta-TAR tree to generate the delta-TARs and to query the ARs. The set of delta-TARs is shown to be significantly smaller than the state-of-the-art concise representations of ARs. In addition, the approximation on the support and confidence of the ARs derived from the delta-TARs are highly accurate.
引用
收藏
页码:221 / 249
页数:29
相关论文
共 28 条
[1]   A new approach to online generation of association rules [J].
Aggarwal, CC ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (04) :527-540
[2]  
AGRAWAL R, 1993, P ACM C MAN DAT SIGM
[3]  
[Anonymous], P 1998 ACM SIGMOD IN
[4]  
[Anonymous], 2006, ACM Computing Surveys, DOI DOI 10.1145/1132956.1132958
[5]  
Bastide I, 2000, LECT NOTES ARTIF INT, V1861, P972
[6]   Free-sets: A condensed representation of Boolean data for the approximation of frequency queries [J].
Boulicaut, JF ;
Bykowski, A ;
Rigotti, C .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) :5-22
[7]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74
[8]  
CHENG J, 2007, IN PRESS J INTELL IN
[9]  
CHENG J, 2006, P 6 IEEE INT C DAT M
[10]  
CHENG J, 2007, P ACM SIGMOD INT C M, P857