Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets

被引:100
作者
Tseng, Vincent S. [1 ]
Wu, Cheng-Wei [1 ]
Fournier-Viger, Philippe [2 ]
Yu, Philip S. [3 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 70101, Taiwan
[2] Univ Moncton, Dept Comp Sci, Moncton, NB E1A 3E9, Canada
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
Frequent itemset; closed(+) high utility itemset; lossless and concise representation; utility mining; data mining; FREQUENT PATTERNS; SPACE; SETS;
D O I
10.1109/TKDE.2014.2345377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining high utility itemsets (HUIs) from databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). However, it may present too many HUIs to users, which also degrades the efficiency of the mining process. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed(+) high utility itemsets (CHUIs), which serves as a compact and lossless representation of HUIs. We propose three efficient algorithms named AprioriCH (Apriori-based algorithm for mining High utility Closed(+) itemsets), AprioriHC-D (AprioriHC algorithm with Discarding unpromising and isolated items) and CHUD (Closed(+) High Utility Itemset Discovery) to find this representation. Further, a method called DAHU (Derive All High Utility Itemsets) is proposed to recover all HUIs from the set of CHUIs without accessing the original database. Results on real and synthetic datasets show that the proposed algorithms are very efficient and that our approaches achieve a massive reduction in the number of HUIs. In addition, when all HUIs can be recovered by DAHU, the combination of CHUD and DAHU outperforms the state-of-the-art algorithms for mining HUIs.
引用
收藏
页码:726 / 739
页数:14
相关论文
共 31 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[3]  
[Anonymous], 2003, P 9 ACM SIGKDD INT C, DOI [10.1145/956750.956779, DOI 10.1145/956750.956779]
[4]  
Bac L, 2009, 2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, P13, DOI 10.1109/ACIIDS.2009.55
[5]  
Bay V, 2009, LECT NOTES ARTIF INT, V5711, P251
[6]   Free-sets: A condensed representation of Boolean data for the approximation of frequency queries [J].
Boulicaut, JF ;
Bykowski, A ;
Rigotti, C .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) :5-22
[7]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74
[8]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[9]  
Cheng Wei Wu, 2011, Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), P824, DOI 10.1109/ICDM.2011.60
[10]   Mining top-k frequent patterns in the presence of the memory constraint [J].
Chuang, Kun-Ta ;
Huang, Jiun-Long ;
Chen, Ming-Syan .
VLDB JOURNAL, 2008, 17 (05) :1321-1344