High utility pattern mining over data streams with sliding window technique

被引:80
作者
Ryang, Heungmo [1 ]
Yun, Unil [1 ]
机构
[1] Sejong Univ, Dept Comp Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Data mining; High utility pattern; Sliding window; Stream mining; Utility mining; FREQUENT ITEMSETS; EFFICIENT ALGORITHMS; DISCOVERY;
D O I
10.1016/j.eswa.2016.03.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing changeable data streams in real time is one of the most important issues in the data mining field due to its broad applications such as retail market analysis, wireless sensor networks, and stock market prediction. In addition, it is an interesting and challenging problem to deal with the stream data since not only the data have unbounded, continuous, and high speed characteristics but also their environments have limited resources. High utility pattern mining, meanwhile, is one of the essential research topics in pattern mining to overcome major drawbacks of the traditional framework for frequent pattern mining that takes only binary databases and identical item importance into consideration. This approach conducts mining processes by reflecting characteristics of real world databases, non-binary quantities and relative importance of items. Although relevant algorithms were proposed for finding high utility patterns in stream environments, they suffer from a level-wise candidate generation-and-test and a large number of candidates by their overestimation techniques. As a result, they consume a huge amount of execution time, which is a significant performance issue since a rapid process is necessary in stream data analysis. In this paper, we propose an algorithm for mining high utility patterns from resource limited environments through efficient processing of data streams in order to solve the problems of the overestimation-based methods. To improve mining performance with fewer candidates and search space than the previous ones, we develop two techniques for reducing overestimated utilities. Moreover, we suggest a tree-based data structure to maintain information of stream data and high utility patterns. The proposed tree is restructured by our updating method with decreased overestimation utilities to keep up-to-date stream information whenever the current window slides. Our approach also has an important effect on expert and intelligent systems in that it can provide users with more meaningful information than traditional analysis methods by reflecting the characteristics of real world non-binary databases in stream environments and emphasizing on recent data. Comprehensive experimental results show that our algorithm outperforms the existing sliding window-based one in terms of runtime efficiency and scalability. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:214 / 231
页数:18
相关论文
共 66 条
[1]   Interactive mining of high utility patterns over data streams [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Choi, Ho-Jin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (15) :11979-11991
[2]   Single-pass incremental and interactive mining for weighted frequent patterns [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo ;
Choi, Ho-Jin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) :7976-7994
[3]   Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) :1708-1721
[4]   CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction [J].
Alkan, Oznur Kirmemis ;
Karagoz, Pinar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) :2645-2657
[5]  
[Anonymous], 1994, P INT C VERY LARGE D
[6]   Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors [J].
Bac Le ;
Minh-Thai Tran ;
Bay Vo .
APPLIED INTELLIGENCE, 2015, 43 (01) :74-84
[7]   Extracting share frequent itemsets with infrequent subsets [J].
Barber, B ;
Hamilton, HJ .
DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (02) :153-185
[8]  
Barber B, 2000, LECT NOTES COMPUT<D>, V1910, P316
[9]   A new method for mining Frequent Weighted Itemsets based on WIT-trees [J].
Bay Vo ;
Coenen, Frans ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (04) :1256-1264
[10]   DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets [J].
Bay Vo ;
Hong, Tzung-Pei ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7196-7206