Optimization of frequent item set mining parallelization algorithm based on spark platform

被引:0
|
作者
Deng, Fan [1 ]
Wang, Jiabin [1 ]
Lv, Sheng [1 ]
机构
[1] Huaqiao Univ, Sch Engn, Quanzhou 362011, Fujian, Peoples R China
关键词
Frequent pattern mining; Spark parallelization; Transaction compression; Boolean matrices;
D O I
10.1007/s10791-024-09470-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance.
引用
收藏
页数:19
相关论文
共 22 条
  • [11] Research on Parallelization of Microblog Emotional Analysis Algorithms Using Deep Learning and Attention Model Based on Spark Platform
    Shi, Min
    IEEE ACCESS, 2019, 7 : 177211 - 177218
  • [12] A Graph-Based Differentially Private Algorithm for Mining Frequent Sequential Patterns
    Nunez-del-Prado, Miguel
    Maehara-Aliaga, Yoshitomi
    Salas, Julian
    Alatrista-Salas, Hugo
    Megias, David
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [13] An uncertainty-based approach: Frequent itemset mining from uncertain data with different item importance
    Lee, Gangin
    Yun, Unil
    Ryang, Heungmo
    KNOWLEDGE-BASED SYSTEMS, 2015, 90 : 239 - 256
  • [14] A NOVEL ALGORITHM FOR FAST MINING FREQUENT PATTERNS BASED ON SUPPORT LIST STRUCTURE
    Zhu, Xiaolin
    JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2022, 23 (09) : 1943 - 1966
  • [15] A non-group parallel frequent pattern mining algorithm based on conditional patterns
    Kuang, Zhe-jun
    Zhou, Hang
    Zhou, Dong-dai
    Zhou, Jin-peng
    Yang, Kun
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2019, 20 (09) : 1234 - 1245
  • [16] A Frequent Pattern Mining Algorithm Based on FP-growth without Generating Tree
    Tohidi, Hossein
    Ibrahim, Hamidah
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT 5TH INTERNATIONAL CONFERENCE 2010, 2010, : 723 - 728
  • [17] A non-group parallel frequent pattern mining algorithm based on conditional patterns
    Zhe-jun Kuang
    Hang Zhou
    Dong-dai Zhou
    Jin-peng Zhou
    Kun Yang
    Frontiers of Information Technology & Electronic Engineering, 2019, 20 : 1234 - 1245
  • [18] A Schema Feature Based Frequent Pattern Mining Algorithm for Semi-structured Data Stream
    Fu, Weiqi
    Liao, Husheng
    Jin, Xueyun
    PROCEEDINGS OF THE 2017 5TH INTERNATIONAL CONFERENCE ON FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY (FMSMT 2017), 2017, 130 : 1329 - 1336
  • [19] Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
    Yu, Kun-Ming
    Zhou, Jiayi
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (03) : 2486 - 2494
  • [20] Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters
    Zhou, Jiayi
    Yu, Kun-Ming
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2008, 5036 : 18 - 28