An empirical evaluation of high utility itemset mining algorithms

被引:26
作者
Zhang, Chongsheng [1 ]
Almpanidis, George [1 ]
Wang, Wanwan [1 ]
Liu, Changchang [1 ]
机构
[1] Henan Univ, Sch Comp & Informat Engn, Kaifeng 475001, Peoples R China
基金
美国国家科学基金会;
关键词
itemset mining; High utility itemsets; State-of-the-art high utility itemset mining; EFFICIENT ALGORITHMS; SEQUENTIAL PATTERNS; DISCOVERY;
D O I
10.1016/j.eswa.2018.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility itemset mining (HUIM) has emerged as an important research topic in data mining, with applications to retail-market data analysis, stock market prediction, and recommender systems, etc. However, there are very few empirical studies that systematically compare the performance of state-of-the-art HUIM algorithms. In this paper, we present an experimental evaluation on 10 major HUIM algorithms, using 9 real world and 27 synthetic datasets to evaluate their performance. Our experiments show that EFIM and d2HUP are generally the top two performers in running time, while EFIM also consumes the least memory in most cases. In order to compare these two algorithms in depth, we use another 45 synthetic datasets with varying parameters so as to study the influence of the related parameters, in particular the number of transactions, the number of distinct items and average transaction length, on the running time and memory consumption of EFIM and d2HUP. In this work, we demonstrate that, d2HUP is more efficient than EFIM under low minimum utility values and with large sparse datasets, in terms of running time; although EFIM is the fastest in dense real datasets, it is among the slowest algorithms in sparse datasets. We suggest that, when a dataset is very sparse or the average transaction length is large, and running time is favoured over memory consumption, d2HUP should be chosen. Finally, we compare d2HUP and EFIM with two newest algorithms, mHUlMiner and ULB-Miner, and find these two algorithms have moderate performance. This work has reference value for researchers and practitioners when choosing the most appropriate HUIM algorithm for their specific applications. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:91 / 115
页数:25
相关论文
共 42 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]   A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo .
ETRI JOURNAL, 2010, 32 (05) :676-686
[3]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[4]   CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction [J].
Alkan, Oznur Kirmemis ;
Karagoz, Pinar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) :2645-2657
[5]  
[Anonymous], 2010, KDD, DOI DOI 10.1145/1835804.1835839
[6]   A scalable algorithm for the market basket analysis [J].
Cavique, Luis .
JOURNAL OF RETAILING AND CONSUMER SERVICES, 2007, 14 (06) :400-407
[7]  
Chan R, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P19
[8]  
Dawar S, 2015, IEEE IDEAS, V2015, P56
[9]  
Deng Z.-H., 2015, 151002188 ARXIV
[10]  
Duong Q- H, 2017, APPL INTELL, P1