Generalized association rule mining using an efficient data structure

被引:19
作者
Wu, Chieh-Ming [1 ]
Huang, Yin-Fu [1 ]
机构
[1] Natl Yunlin Univ Sci & Technol, Grad Sch Engn Sci & Technol, Touliu 640, Yunlin, Taiwan
关键词
Data mining; Generalized association rules; Frequent itemsets; Frequent closed itemsets; FCET; GMAR; GMFI;
D O I
10.1016/j.eswa.2010.12.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this paper is to use an efficient data structure to find the generalized association rules between the items at different levels in a taxonomy tree under the assumption that the original frequent itemsets and association rules were generated in advance. The primary challenge of designing an efficient mining algorithm is how to make use of the original frequent itemsets and association rules to directly generate new generalized association rules, rather than rescanning the database. In the paper, we used an efficient data structure called the frequent closed enumeration table (FCET) to store the relevant information. It stores only maximal itemsets, and can be used to derive the information of the subset itemsets in a maximal itemset through a hash function. In the proposed algorithms GMAR and GMFI, we used join methods and/or pruning techniques to generate new generalized association rules. Through several comprehensive experiments, we found that both algorithms are much better than BASIC and Cumulate algorithms also using the efficient data structure (FCET), owing to fewer candidate itemsets generated by GMAR and GMFI. Furthermore, the GMAR algorithm prunes a large amount of irrelevant rules based on the minimum confidence. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7277 / 7290
页数:14
相关论文
共 35 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., 1994, VLDB 1994, P487
[3]   MAFIA: A maximal frequent itemset algorithm for transactional databases [J].
Burdick, D ;
Calimlim, M ;
Gehrke, J .
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :443-452
[4]  
FU YJ, 1997, IEEE POTENTIALS, V16, P18
[5]   Mining generalized association rules for sequential and path data [J].
Gaul, W ;
Schmidt-Thieme, L .
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, :593-596
[6]  
Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00009-5]
[7]  
HAN J, 2000, P 2000 ACM SIGMOD IN, P1, DOI DOI 10.1145/342009.335372
[8]   Mining multiple-level association rules in large databases [J].
Han, JW ;
Fu, WJ .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1999, 11 (05) :798-805
[9]  
Hipp J, 1998, LECT NOTES ARTIF INT, V1510, P74
[10]  
Huang YF, 2002, 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P227, DOI 10.1109/ICDM.2002.1183907