New Spark solutions for distributed frequent itemset and association rule mining algorithms

被引：0

作者：

Carlos Fernandez-Basso

M. Dolores Ruiz

Maria J. Martin-Bautista

机构：

[1] University of Granada,Dept of Computer Science and A.I.

[2] University College London,Causal Cognition lab

来源：

Cluster Computing | 2024年 / 27卷

关键词：

Big Data; Data Mining; Association Rule; Frequent Itemset; Distributed computing; Spark;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.

引用

页码：1217 / 1234

页数：17

共 97 条

[1]

Wu X(2014)Data mining with big data Knowl. Data Eng. IEEE Trans. 26 97-107

[2]

Zhu X(2001)Random forests Mach. Learn. 45 5-32

[3]

Wu G-Q(2016)Mllib: machine learning in apache spark J. Mach. Learn. Res. 17 1235-1241

[4]

Ding W(2000)Scalable algorithms for association mining IEEE Trans. Know. Data Eng. 12 372-390

[5]

Breiman L(2000)Mining frequent patterns without candidate generation ACM Sigmod Record 29 1-12

[6]

Meng X(2010)Studying interest measures for association rules through a logical model Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 18 87-59

[7]

Bradley J(2016)Social big data: Recent achievements and new challenges Information Fusion 28 45-1440

[8]

Yavuz B(2020)A fuzzy mining approach for energy efficiency in a big data framework IEEE Trans. Fuzzy Syst. 16 1424-399

[9]

Sparks E(2004)Mining sequential patterns by pattern-growth: the prefixspan approach Knowl. Data Eng. IEEE Trans. 19 361-151

[10]

Venkataraman S(2011)New approaches for discovering exception and anomalous rules Int. J. Uncertain. Fuzziness Knowled.-Based Syst. 125 133-25

← 1 2 3 4 5 6 7 8 9 10 →