An efficient structure for fast mining high utility itemsets

被引:19
作者
Deng, Zhi-Hong [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Key Lab Machine Percept, Minist Educ, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Data structure; Data mining; High utility itemset; PUN-list; Utility mining; FREQUENT ITEMSETS; ALGORITHMS; DATABASES; PATTERNS;
D O I
10.1007/s10489-017-1130-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, the mining efficiency is still a big challenge since these algorithms suffer from either the problem of low efficiency of calculating candidates' utilities or the problem of generating huge number of candidates. In this paper, we propose a novel data structure named PUN-list (PU-tree-Node list), which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, named MIP (Mining high utility Itemset using PUN-Lists), for efficiently mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, named PUN-list, which avoids costly and repeated utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be efficiently constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, named set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that MIP is very efficient since it is much faster than HUI-Miner, d2HUP, and UP-Growth + , especially on dense datasets.
引用
收藏
页码:3161 / 3177
页数:17
相关论文
共 35 条
[21]   Direct Discovery of High Utility Itemsets without Candidate Generation [J].
Liu, Junqiang ;
Wang, Ke ;
Fung, Benjamin C. M. .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :984-989
[22]  
Liu MC., 2012, ACM INT C P SERIES, P55, DOI DOI 10.1145/2396761.2396773
[23]  
Liu Y, 2005, LECT NOTES ARTIF INT, V3518, P689
[24]   H-mine: Hyper-structure mining of frequent patterns in large databases [J].
Pei, J ;
Han, JW ;
Lu, HJ ;
Nishio, S ;
Tang, SW ;
Yang, DQ .
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, :441-448
[25]  
RYMON R, 1992, PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE (KR 92), P539
[26]   Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets [J].
Tseng, Vincent S. ;
Wu, Cheng-Wei ;
Fournier-Viger, Philippe ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) :726-739
[27]   Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases [J].
Tseng, Vincent S. ;
Shie, Bai-En ;
Wu, Cheng-Wei ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) :1772-1786
[28]  
Tseng VS, 2010, SIGKDD, V2010, P253
[29]   Mining frequent itemsets using the N-list and subsume concepts [J].
Vo, Bay ;
Le, Tuong ;
Coenen, Frans ;
Hong, Tzung-Pei .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2016, 7 (02) :253-265
[30]  
Wu CW, 2012, P 18 ACM SIGKDD INT, P78, DOI DOI 10.1145/2339530.2339546