Efficient Algorithms for Mining Top-K High Utility Itemsets

被引:160
作者
Tseng, Vincent S. [1 ]
Wu, Cheng-Wei [1 ]
Fournier-Viger, Philippe [2 ]
Yu, Philip S. [3 ,4 ]
机构
[1] Natl Chao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Univ Moncton, Dept Comp Sci, Moncton, NB E1A 3E9, Canada
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
[4] Tsinghua Univ, Inst Data Sci, Beijing 100084, Peoples R China
基金
美国国家科学基金会;
关键词
Utility mining; high utility itemset mining; top-k pattern mining; top-k high utility itemset mining; FREQUENT PATTERNS;
D O I
10.1109/TKDE.2015.2458860
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility itemsets (HUIs) mining is an emerging topic in data mining, which refers to discovering all itemsets having a utility meeting a user-specified minimum utility threshold min_util. However, setting min_util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min_util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min_util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top-k high utility itemset mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utility itemsets) and TKO (mining Top-K utility itemsets in One phase) are proposed for mining such itemsets without the need to set min_util. We provide a structural comparison of the two algorithms with discussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state-of-the-art utility mining algorithms.
引用
收藏
页码:54 / 67
页数:14
相关论文
共 38 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[3]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[4]  
Cheng Wei Wu, 2011, Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), P824, DOI 10.1109/ICDM.2011.60
[5]   Mining top-k frequent patterns in the presence of the memory constraint [J].
Chuang, Kun-Ta ;
Huang, Jiun-Long ;
Chen, Ming-Syan .
VLDB JOURNAL, 2008, 17 (05) :1321-1344
[6]  
Fournier-Viger Philippe, 2012, Advances in Artificial Intelligence. Proceedings 25th Canadian Conference on Artificial Intelligence, Canadian AI 2012, P61, DOI 10.1007/978-3-642-30353-1_6
[7]   Novel concise representations of high utility itemsets using generator patterns [J].
Fournier-Viger, Philippe ;
Wu, Cheng-Wei ;
Tseng, Vincent S .
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8933 :30-43
[8]  
Fournier-Viger P, 2011, LECT NOTES ARTIF INT, V7121, P180
[9]  
Han JW, 2002, 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P211, DOI 10.1109/ICDM.2002.1183905
[10]  
Han JW, 2000, SIGMOD RECORD, V29, P1