Optimization of frequent item set mining parallelization algorithm based on spark platform

被引:0
|
作者
Deng, Fan [1 ]
Wang, Jiabin [1 ]
Lv, Sheng [1 ]
机构
[1] Huaqiao Univ, Sch Engn, Quanzhou 362011, Fujian, Peoples R China
关键词
Frequent pattern mining; Spark parallelization; Transaction compression; Boolean matrices;
D O I
10.1007/s10791-024-09470-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a new method that combines the parallelism of the Spark-based platform with fast frequent mining, called STB_Apriori. Previous research has shown that traditional frequent itemset mining algorithms have high overhead when faced with large datasets and high-dimensional data computation, and generate a large number of candidate itemsets; at the same time, when faced with diverse user requirements, they often generate very sparse and diverse data. In order to solve the problem of fast mining of massive data, our idea originates from the capability of Spark distributed computing and the common optimisation ideas in Apriori mining, by using the efficient operator BitSet to achieve transaction compression, bit storage and data manipulation by Boolean matrices, and at the same time by parallelising the processing and optimising the algorithmic logic to achieve fast and frequent mining. In experiments on real-world datasets, our model consistently outperforms five widely used methods by a significant margin on very large data and maintains its excellence in the remaining cases, proving its effectiveness on real-world tasks, while further analysis shows that increasing the number of distributed nodes also incrementally and continuously improves performance.
引用
收藏
页数:19
相关论文
共 22 条
  • [1] Novel Frequent Pattern Mining Algorithm based on Parallelization scheme
    Gatuha, George
    Jiang, Tao
    INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH IN AFRICA, 2016, 23 : 131 - 140
  • [2] Data Elimination Based Technique for Mining Frequent Closed Item Set
    Ahuja, Kamlesh
    Jain, Sarika
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ICT IN BUSINESS INDUSTRY & GOVERNMENT (ICTBIG), 2016,
  • [3] An Efficient Spark-Based Hybrid Frequent Itemset Mining Algorithm for Big Data
    Al-Bana, Mohamed Reda
    Farhan, Marwa Salah
    Othman, Nermin Abdelhakim
    DATA, 2022, 7 (01)
  • [4] HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing
    Krishan Kumar Sethi
    Dharavath Ramesh
    The Journal of Supercomputing, 2017, 73 : 3652 - 3668
  • [5] HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing
    Sethi, Krishan Kumar
    Ramesh, Dharavath
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (08): : 3652 - 3668
  • [6] A Compact Data Structure Based Technique for Mining Frequent Closed Item Sets
    Ahuja, Kamlesh
    Mishra, Durgesh Kumar
    Jain, Sarika
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 503 - 508
  • [7] An optimal text compression algorithm based on frequent pattern mining
    Oswald, C.
    Sivaselvan, B.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2018, 9 (03) : 803 - 822
  • [8] An optimal text compression algorithm based on frequent pattern mining
    C. Oswald
    B. Sivaselvan
    Journal of Ambient Intelligence and Humanized Computing, 2018, 9 : 803 - 822
  • [9] Multiple Item Support Constraints Based Frequent Pattern Mining Using Dynamic Prefix Tree
    Biswas, Sudarsan
    Saha, Diganta
    Pandit, Rajat
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2025, 33 (02) : 143 - 172
  • [10] Frequent Pattern Mining Based On Imperative Tabularized Apriori Algorithm (ITAA)
    Tanna, Paresh
    Ghodasara, Yogesh
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,