Efficient top-k high utility itemset mining on massive data

被引:30
作者
Han, Xixian [1 ]
Liu, Xianmin [1 ]
Li, Jianzhong [1 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
关键词
Massive data; Top-k high utility itemsets; Prefix-based partitioning; Full suffix utility; ALGORITHMS; THRESHOLD;
D O I
10.1016/j.ins.2020.08.028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In practical applications, top-k high utility itemset mining (top-k HUIM) is an interesting operation to find the k itemsets with the highest utilities. It is analyzed that, the existing algorithms only can deal with the small and medium-sized data, and their performance degrades significantly on massive data. This paper presents a novel top-k HUIM algorithm PTM which can mine top-k high utility itemsets on massive data efficiently. PTM maintains the transaction database as a set of prefix-based partitions, each of which can fit in the memory and keeps the transactions with the same prefix item. The utility of an itemset can be computed by the transactions in one particular partition. PTM processes the prefix-based partitions in the order of average transaction utility to raise utility threshold faster. By the concise assistant data structures, PTM can skip majority of the partitions directly. For the partitions to be processed, PTM utilizes depth-first search on the set enumeration tree of the promising items to find the required results. The full-suffix-utilitybased subtree pruning rule is devised to reduce the exploration space of set enumeration tree effectively. The extensive experimental results show that PTM can discover the top-k high utility itemsets on massive data efficiently. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:382 / 406
页数:25
相关论文
共 34 条
[1]  
Ada Wai-Chee Fu, 2000, Foundations of Intelligent Systems. 12th International Symposium, ISMIS 2000. Proceedings (Lecture Notes in Artificial Intelligence Vol.1932), P59
[2]  
Aggarwal C., 2014, FREQUENT PATTERN MIN, DOI DOI 10.1007/978-3-319-07821-2
[3]  
Agrawal R, 1994, P 20 INT C VER LARG
[4]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[5]   Mining frequent itemsets without support threshold: With and without item constraints [J].
Cheung, YL ;
Fu, AWC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1052-1069
[6]   Exploiting GPU and cluster parallelism in single scan frequent itemset mining [J].
Djenouri, Youcef ;
Djenouri, Djamel ;
Belhadi, Asma ;
Cano, Alberto .
INFORMATION SCIENCES, 2019, 496 :363-377
[7]   Efficient high utility itemset mining using buffered utility-lists [J].
Duong, Quang-Huy ;
Fournier-Viger, Philippe ;
Ramampiaro, Heri ;
Norvag, Kjetil ;
Dam, Thu-Lan .
APPLIED INTELLIGENCE, 2018, 48 (07) :1859-1877
[8]  
Fournier-Viger P., 2019, HIGH UTILITY PATTERN
[9]   The SPMF Open-Source Data Mining Library Version 2 [J].
Fournier-Viger, Philippe ;
Lin, Jerry Chun-Wei ;
Gomariz, Antonio ;
Gueniche, Ted ;
Soltani, Azadeh ;
Deng, Zhihong ;
Hoang Thanh Lam .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2016, PT III, 2016, 9853 :36-40
[10]  
Fournier-Viger P, 2014, LECT NOTES COMPUT SC, V8436, P83, DOI 10.1007/978-3-319-06483-3_8