An efficient structure for fast mining high utility itemsets

被引:19
作者
Deng, Zhi-Hong [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Data structure; Data mining; High utility itemset; PUN-list; Utility mining; FREQUENT ITEMSETS; ALGORITHMS; DATABASES; PATTERNS;
D O I
10.1007/s10489-017-1130-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, the mining efficiency is still a big challenge since these algorithms suffer from either the problem of low efficiency of calculating candidates' utilities or the problem of generating huge number of candidates. In this paper, we propose a novel data structure named PUN-list (PU-tree-Node list), which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, named MIP (Mining high utility Itemset using PUN-Lists), for efficiently mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, named PUN-list, which avoids costly and repeated utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be efficiently constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, named set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that MIP is very efficient since it is much faster than HUI-Miner, d2HUP, and UP-Growth + , especially on dense datasets.
引用
收藏
页码:3161 / 3177
页数:17
相关论文
共 35 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Ahmed Chowdhury Farhan, 2010, Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2010), P76, DOI 10.1109/SNPD.2010.21
[3]   HUC-Prune: an efficient candidate pruning technique to mine high utility patterns [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
APPLIED INTELLIGENCE, 2011, 34 (02) :181-198
[4]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[5]  
[Anonymous], 1994, P INT C VERY LARGE D
[6]   A Hybrid Approach for Mining Frequent Itemsets [J].
Bay Vo ;
Tuong Le ;
Coenen, Frans ;
Hong, Tzung-Pei .
2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, :4647-4651
[7]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[8]  
Dawar S, 2015, IEEE IDEAS, V2015, P56
[9]   DiffNodesets: An efficient structure for fast mining frequent itemsets [J].
Deng, Zhi-Hong .
APPLIED SOFT COMPUTING, 2016, 41 :214-223
[10]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432