TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

被引:16
作者
Yang, Bei [1 ,2 ]
Huang, Houkuan [2 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
[2] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
关键词
Data stream; Landmark window; Top-K significant itemset; Pruning; Dynamic approximate algorithm; RECENTLY FREQUENT ITEMSETS;
D O I
10.1007/s10115-009-0211-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. It will bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting mining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.
引用
收藏
页码:225 / 242
页数:18
相关论文
共 24 条
[1]  
Ada Wai-Chee Fu, 2000, Foundations of Intelligent Systems. 12th International Symposium, ISMIS 2000. Proceedings (Lecture Notes in Artificial Intelligence Vol.1932), P59
[2]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[3]  
[Anonymous], 2003, P 3 ACM SIGCOMM INTE, DOI [DOI 10.1145/948205.948227, 10.1145/948205.948227]
[4]  
BABCOCK B, 2003, P 2003 ACM SIGMOD IN, P28, DOI DOI 10.1145/872757.872764
[5]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[6]  
Chang JH, 2004, J INF SCI ENG, V20, P753
[7]   Finding recently frequent itemsets adaptively over online transactional data streams [J].
Chang, Joong Hyuk ;
Lee, Won Suk .
INFORMATION SYSTEMS, 2006, 31 (08) :849-869
[8]   Mining frequent itemsets without support threshold: With and without item constraints [J].
Cheung, YL ;
Fu, AWC .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) :1052-1069
[9]   Catch the moment: maintaining closed frequent itemsets over a data stream sliding window [J].
Chi, Yun ;
Wang, Haixun ;
Yu, Philip S. ;
Muntz, Richard R. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (03) :265-294
[10]   Online mining of frequent sets in data streams with error guarantee [J].
Dang, Xuan Hong ;
Ng, Wee-Keong ;
Ong, Kok-Leong .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (02) :245-258