An improved approach for mining association rules in parallel using Spark Streaming

被引:16
作者
Liu, Longtao [1 ]
Wen, Jiabao [1 ]
Zheng, Zexun [1 ]
Su, Hansong [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
Argo; association rules; data mining; parallel computing; Spark Streaming;
D O I
10.1002/cta.2935
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Parallel computing is an effective method to solve computationally large and data-intensive problems. The traditional data mining algorithm cannot mining association rules for large amounts of streaming data in a timely and effectively. In order to improve the speed and accuracy of association rules mining, distributed and parallel algorithms have become a research focus. This paper proposes a parallel FP-growth approach using Spark Streaming, called SSPFP, which can parallel mining frequent itemsets and association rules in real-time streaming data. In this paper, the proposed SSPFP algorithm is applied to mining the association rules between temperature and salinity in marine Argo data. The experimental results indicate that SSPFP algorithm is efficient for association rules mining.
引用
收藏
页码:1028 / 1039
页数:12
相关论文
共 27 条
  • [1] Aggarwal C. C., 2014, Frequent pattern mining algorithms: A survey, P19, DOI 10.1007/978-3-319-07821-2_2
  • [2] A binary-based on-chip CNN solution for pixel-level snakes
    Brea, V. M.
    Laiho, M.
    Vilarino, D. L.
    Paasio, A.
    Cabello, D.
    [J]. INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2006, 34 (04) : 383 - 407
  • [3] Mining frequent patterns in a varying-size sliding window of online transactional data streams
    Chen, Hui
    Shu, Lihchyun
    Xia, Jiali
    Deng, Qingshan
    [J]. INFORMATION SCIENCES, 2012, 215 : 15 - 36
  • [4] A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment
    Chen, Jianguo
    Li, Kenli
    Tang, Zhuo
    Bilal, Kashif
    Yu, Shui
    Weng, Chuliang
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (04) : 919 - 933
  • [5] Adaptive Scheduling Parallel Jobs with Dynamic Batching in Spark Streaming
    Cheng, Dazhao
    Zhou, Xiaobo
    Wang, Yu
    Jiang, Changjun
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (12) : 2672 - 2685
  • [6] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [7] Mining frequent patterns without candidate generation: A frequent-pattern tree approach
    Han, JW
    Pei, J
    Yin, YW
    Mao, RY
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) : 53 - 87
  • [8] Spark : A Big Data Processing Platform Based On Memory Computing
    Han, Zhijie
    Zhang, Yujie
    [J]. 2015 SEVENTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2015, : 172 - 176
  • [9] Hand DJ., 2014, WILEY STATSREF STAT, V18, P442
  • [10] Parallel mining of association rules from text databases
    Holt, John D.
    Chung, Soon M.
    [J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (03) : 273 - 299