Fast algorithm for high utility pattern mining with the sum of item quantities

被引:36
作者
Ryang, Heungmo [1 ]
Yun, Unil [1 ]
Ryu, Keun Ho [2 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul, South Korea
[2] Chungbuk Natl Univ, Dept Comp Sci, Cheongju, South Korea
基金
新加坡国家研究基金会;
关键词
Data mining; high utility patterns; single-pass tree construction; tree restructuring; utility mining; FREQUENT ITEMSETS;
D O I
10.3233/IDA-160811
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In frequent pattern mining, items are considered as having the same importance in a database and their occurrence are represented as binary values in transactions. In real-world databases, however, items not only have relative importance but also are represented as non-binary values in transactions. High utility pattern mining is one of the most essential issues in the pattern mining field, which recently emerged to address the limitation of frequent pattern mining. Meanwhile, tree construction with a single database scan is significant since a database scan is a time-consuming task. In utility mining, an additional database scan is necessary to identify actual high utility patterns from candidates. In this paper, we propose a novel tree structure, namely SIQ-Tree (Sum of Item Quantities), which captures database information through a single-pass. Moreover, a restructuring method is suggested with strategies for reducing overestimated utilities. The proposed algorithm can construct the SIQ-Tree with only a single scan and decrease the number of candidate patterns effectively with the reduced overestimation utilities, through which mining performance is improved. Experimental results show that our algorithm outperforms a state-of-the-art one in terms of runtime and the number of generated candidates with a similar memory usage.
引用
收藏
页码:395 / 415
页数:21
相关论文
共 37 条
[1]   Interactive mining of high utility patterns over data streams [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Choi, Ho-Jin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (15) :11979-11991
[2]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[3]  
[Anonymous], 1994, P INT C VERY LARGE D
[4]   Extracting share frequent itemsets with infrequent subsets [J].
Barber, B ;
Hamilton, HJ .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (02) :153-185
[5]  
Barber B., 2000, PRINCIPLES DATA MINI
[6]  
Cheng Wei Wu, 2011, Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM 2011), P824, DOI 10.1109/ICDM.2011.60
[7]   UT-Tree: Efficient mining of high utility itemsets from data streams [J].
Feng, Lin ;
Wang, Le ;
Jin, Bo .
INTELLIGENT DATA ANALYSIS, 2013, 17 (04) :585-602
[8]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[9]   Algorithms for mining frequent itemsets in static and dynamic datasets [J].
Hernandez-Leon, R. ;
Hernandez-Palancar, J. ;
Carrasco-Ochoa, Jesus A. ;
Fco Martinez-Trinidad, Jose .
INTELLIGENT DATA ANALYSIS, 2010, 14 (03) :419-435
[10]   Effective utility mining with the measure of average utility [J].
Hong, Tzung-Pei ;
Lee, Cho-Han ;
Wang, Shyue-Liang .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8259-8265