ETARM: an efficient top-k association rule mining algorithm

被引:25
作者
Nguyen, Linh T. T. [1 ]
Bay Vo [2 ]
Nguyen, Loan T. T. [3 ,4 ]
Fournier-Viger, Philippe [5 ]
Selamat, Ali [6 ]
机构
[1] Dong An Polytech, Fac Informat Technol, Binh Duong, Vietnam
[2] Ho Chi Minh City Univ Technol, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Ton Duc Thang Univ, Div Knowledge & Syst Engn ICT, Ho Chi Minh City, Vietnam
[4] Ton Duc Thang Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[5] Harbin Inst Technol, Sch Humanities & Social Sci, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
[6] Univ Teknol Malaysia, Utm Johor Bahru 81310, Johor, Malaysia
关键词
Data mining; Association rule mining; Top-k association rules; Rule Expansion; FREQUENT CLOSED ITEMSETS; PATTERNS; COMBINATION; DISCOVERY; LATTICE; LISTS;
D O I
10.1007/s10489-017-1047-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.
引用
收藏
页码:1148 / 1160
页数:13
相关论文
共 31 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., P 20 INT C VERY LARG
[3]   A lattice-based approach for mining most generalization association rules [J].
Bay Vo ;
Hong, Tzung-Pei ;
Le, Bac .
KNOWLEDGE-BASED SYSTEMS, 2013, 45 :20-30
[4]   DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets [J].
Bay Vo ;
Hong, Tzung-Pei ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7196-7206
[5]   Interestingness measures for association rules: Combination between lattice and hash tables [J].
Bay Vo ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) :11630-11640
[6]   Mining top-k frequent patterns in the presence of the memory constraint [J].
Chuang, Kun-Ta ;
Huang, Jiun-Long ;
Chen, Ming-Syan .
VLDB JOURNAL, 2008, 17 (05) :1321-1344
[7]  
Deng ZH, 2007, PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, P851
[8]   DiffNodesets: An efficient structure for fast mining frequent itemsets [J].
Deng, Zhi-Hong .
APPLIED SOFT COMPUTING, 2016, 41 :214-223
[9]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432
[10]   Fast mining Top-Rank-k frequent patterns by using Node-lists [J].
Deng, Zhi-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) :1763-1768