Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

被引:1
作者
Fang’ai Liu
Qianqian Wang
Xin Wang
机构
[1] Shandong Normal University,School of Information Science & Engineering
来源
Cluster Computing | 2019年 / 22卷
关键词
Stream data mining; Multiple data streams; Parallel algorithm; Sliding window; Frequent itemsets; Collaborative frequent itemsets;
D O I
暂无
中图分类号
学科分类号
摘要
With the rapid development of the World Wide Web technology, complex and diverse data present explosive growth, so frequent itemset mining plays an essential role. In view of the mining frequent itemsets in multiple data streams by limited computing power of a single processor, an improved algorithm of Parallel Mining Collaborative frequent itemsets in multiple data streams (PMCMD-Stream) was proposed. Firstly, the algorithm compresses the potential and frequent itemsets into CP-Tree only by one-scan and applies increment method to inserting or deleting related branch on CP-Tree, we do not need to repeatedly scanning the databases to generate many candidate frequent itemsets and save the running time. Secondly, this parallelized algorithm can be run in the MapReduce programming environment. Finally, the valuable frequent itemsets, namely global collaborative frequent itemsets, were obtained. Because each candidate frequent itemset is independent, and different candidate frequent itemsets can be processed by multiple computing machines concurrently. The experimental results show that PMCMD-Stream algorithm not only can improve the mining efficiency but also have much better scalability than the existing algorithms, so as to discover the collaborative frequent itemsets from large-scale data streams.
引用
收藏
页码:6133 / 6141
页数:8
相关论文
共 64 条
[1]  
Gani A(2016)A survey on indexing techniques for big data: taxonomy and performance evaluation Knowl. Inf. Syst. 46 241-284
[2]  
Siddiqa A(2010)Expert security system in wireless sensor networks based on fuzzy discussion multi-agent systems Sci. Res. Essays 5 3840-3849
[3]  
Shamshirband S(1998)Computing on data streams Extern. Mem. Algorithms 50 107-118
[4]  
Shamshirb S(2016)Consistent assimilation of multiple data streams in a carbon cycle data assimilation system Geosci. Model Dev. 9 3569-216
[5]  
Kalantari S(2004)Analysis and management of streaming data: a survey J. Softw. 8 008-87
[6]  
Sam DZ(1993)Mining association rules between sets of items in large databases ACM SIGMOD Record 22 207-3285
[7]  
Henzinger MR(2004)Mining frequent patterns without candidate generation: a frequent-pattern tree approach Data Mining Knowl. Discov. 8 53-1362
[8]  
Raghavan P(2009)Information discovery across multiple streams Inf. Sci. 179 3268-174
[9]  
Rajagopalan S(2007)Clustering over multiple evolving streams by events and correlations IEEE Trans. Knowl. Data Eng. 19 1349-2318
[10]  
MacBean N(2003)Discovering all most specific sentences ACM Trans. Database Syst. (TODS) 28 140-2030