Efficiently Mining Top-K High Utility Sequential Patterns

被引:75
作者
Yin, Junfu [1 ]
Zheng, Zhigang [1 ]
Cao, Longbing [1 ]
Song, Yin [1 ]
Wei, Wei [1 ]
机构
[1] Univ Technol Sydney, Adv Analyt Inst, Sydney, NSW 2007, Australia
来源
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM) | 2013年
关键词
High utility sequential pattern mining; Top-K sequential pattern mining;
D O I
10.1109/ICDM.2013.148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility sequential pattern mining is an emerging topic in the data mining community. Compared to the classic frequent sequence mining, the utility framework provides more informative and actionable knowledge since the utility of a sequence indicates business value and impact. However, the introduction of "utility" makes the problem fundamentally different from the frequency-based pattern mining framework and brings about dramatic challenges. Although the existing high utility sequential pattern mining algorithms can discover all the patterns satisfying a given minimum utility, it is often difficult for users to set a proper minimum utility. A too small value may produce thousands of patterns, whereas a too big one may lead to no findings. In this paper, we propose a novel framework called top-k high utility sequential pattern mining to tackle this critical problem. Accordingly, an efficient algorithm, Top-k high Utility Sequence (TUS for short) mining, is designed to identify top-k high utility sequential patterns without minimum utility. In addition, three effective features are introduced to handle the efficiency problem, including two strategies for raising the threshold and one pruning for filtering unpromising items. Our experiments are conducted on both synthetic and real datasets. The results show that TUS incorporating the efficiency-enhanced strategies demonstrates impressive performance without missing any high utility sequential patterns.
引用
收藏
页码:1259 / 1264
页数:6
相关论文
共 25 条
[1]  
AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2]  
Ahmed Chowdhury Farhan, 2010, Proceedings of the 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2010), P76, DOI 10.1109/SNPD.2010.21
[3]   A Novel Approach for Mining High-Utility Sequential Patterns in Sequence Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo .
ETRI JOURNAL, 2010, 32 (05) :676-686
[4]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[5]   Actionable knowledge discovery and delivery [J].
Cao, Longbing .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (02) :149-163
[6]  
Cao Longbing, 2012, BEHAV COMPUTING
[7]  
CHEUNG YL, 2004, TKDE, V16, P1052
[8]   Mining top-k frequent patterns in the presence of the memory constraint [J].
Chuang, Kun-Ta ;
Huang, Jiun-Long ;
Chen, Ming-Syan .
VLDB JOURNAL, 2008, 17 (05) :1321-1344
[9]  
Dong G., 2007, SEQUENCE DATA MINING
[10]  
Han J., ICDM 2012, P211