SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

被引:21
作者
Xiao, Wen [1 ,2 ]
Hu, Juan [3 ]
机构
[1] Hohai Univ, Coll Comp Sci & Informat, Nanjing, Peoples R China
[2] Wanjiang Univ Technol, Key Lab Unmanned Aerial Vehicle Dev & Data Applic, Maanshan, Peoples R China
[3] Wanjiang Univ Technol, Maanshan Engn Technol Res Ctr Wireless Sensor Net, Maanshan, Peoples R China
关键词
Frequent itemset mining; Streaming data; Sliding window; Distributed; Spark Streaming; CANTREE; TREE;
D O I
10.1007/s11227-020-03190-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Finding frequent itemsets in a continuous streaming data is an important data mining task which is widely used in network monitoring, Internet of Things data analysis and so on. In the era of big data, it is necessary to develop a distributed frequent itemset mining algorithm to meet the needs of massive streaming data processing. Apache Spark is a unified analytic engine for massive data processing which has been successfully used in many data mining fields. In this paper, we propose a distributed algorithm for mining frequent itemsets over massive streaming data named SWEclat. The algorithm uses sliding window to process streaming data and uses vertical data structure to store the dataset in the sliding window. This algorithm is implemented by Apache Spark and uses Spark RDD to store streaming data and dataset in vertical data format, so as to divide these RDDs into partitions for distributed processing. Experimental results show that SWEclat algorithm has good acceleration, parallel scalability and load balancing.
引用
收藏
页码:7619 / 7634
页数:16
相关论文
共 23 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
Agrawal R., P 20 INT C VERY LARG
[3]  
Bo C, 2016, COMPUT SYST SCI ENG, V31, P101
[4]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[5]  
Chang J.-H., 2003, P 9 ACM SIGKDD INT C, P487, DOI DOI 10.1145/956750.956807
[6]   Moment: Maintaining closed frequent itemsets over a stream sliding window [J].
Chi, Y ;
Wang, HX ;
Yu, PS ;
Muntz, RR .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :59-66
[7]   EclatDS: An efficient sliding window based frequent pattern mining method for data streams [J].
Deypir, Mahmood ;
Sadreddini, Mohammad Hadi .
INTELLIGENT DATA ANALYSIS, 2011, 15 (04) :571-587
[8]   Finding tendencies in streaming data using Big Data frequent itemset mining [J].
Fernandez-Basso, Carlos ;
Francisco-Agra, Abel J. ;
Martin-Bautista, Maria J. ;
Dolores Ruiz, M. .
KNOWLEDGE-BASED SYSTEMS, 2019, 163 :666-674
[9]  
Han JW, 2000, SIGMOD RECORD, V29, P1
[10]  
He YS, 2014, P INT CONF NAT COMPU, P725, DOI 10.1109/ICNC.2014.6975926