An improved approach for mining association rules in parallel using Spark Streaming

被引：16

作者：

Liu, Longtao ^{[1
]}

Wen, Jiabao ^{[1
]}

Zheng, Zexun ^{[1
]}

Su, Hansong ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China

来源：

INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS | 2021年 / 49卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Argo; association rules; data mining; parallel computing; Spark Streaming;

D O I：

10.1002/cta.2935

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Parallel computing is an effective method to solve computationally large and data-intensive problems. The traditional data mining algorithm cannot mining association rules for large amounts of streaming data in a timely and effectively. In order to improve the speed and accuracy of association rules mining, distributed and parallel algorithms have become a research focus. This paper proposes a parallel FP-growth approach using Spark Streaming, called SSPFP, which can parallel mining frequent itemsets and association rules in real-time streaming data. In this paper, the proposed SSPFP algorithm is applied to mining the association rules between temperature and salinity in marine Argo data. The experimental results indicate that SSPFP algorithm is efficient for association rules mining.

引用

页码：1028 / 1039

页数：12

共 27 条

[1] Aggarwal C. C., 2014, Frequent pattern mining algorithms: A survey, P19, DOI 10.1007/978-3-319-07821-2_2
[2] A binary-based on-chip CNN solution for pixel-level snakes
Brea, V. M.
Laiho, M.
Vilarino, D. L.
Paasio, A.
Cabello, D.
[J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2006, 34 (04) : 383 - 407
[3] Mining frequent patterns in a varying-size sliding window of online transactional data streams
Chen, Hui
Shu, Lihchyun
Xia, Jiali
Deng, Qingshan
[J]. INFORMATION SCIENCES, 2012, 215 : 15 - 36
[4] A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment
Chen, Jianguo
Li, Kenli
Tang, Zhuo
Bilal, Kashif
Yu, Shui
Weng, Chuliang
Li, Keqin
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) : 919 - 933
[5] Adaptive Scheduling Parallel Jobs with Dynamic Batching in Spark Streaming
Cheng, Dazhao
Zhou, Xiaobo
Wang, Yu
Jiang, Changjun
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (12) : 2672 - 2685
[6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7] Mining frequent patterns without candidate generation: A frequent-pattern tree approach
Han, JW
Pei, J
Yin, YW
Mao, RY
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) : 53 - 87
[8] Spark : A Big Data Processing Platform Based On Memory Computing
Han, Zhijie
Zhang, Yujie
[J]. 2015 SEVENTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2015, : 172 - 176
[9] Hand DJ., 2014, WILEY STATSREF STAT, V18, P442
[10] Parallel mining of association rules from text databases
Holt, John D.
Chung, Soon M.
[J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (03) : 273 - 299

← 1 2 3 →