Fast algorithms for frequent itemset mining using FP-trees

被引：337

作者：

Grahne, G ^{[1
]}

Zhu, JF ^{[1
]}

机构：

[1] Concordia Univ, Dept Comp Sci, Montreal, PQ H3G 1M8, Canada

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2005年 / 17卷 / 10期

关键词：

data mining; association rules;

D O I：

10.1109/TKDE.2005.166

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.

引用

页码：1347 / 1362

页数：16

共 50 条

[41] Parallelization of Frequent Itemset Mining Methods with FP-tree: An Experiment with PrePost+ Algorithm
Jamsheela, Olakara
Gopalakrishna, Raju
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (02) : 208 - 213
[42] New Spark solutions for distributed frequent itemset and association rule mining algorithms
Fernandez-Basso, Carlos
Ruiz, M. Dolores
Martin-Bautista, Maria J.
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (02): : 1217 - 1234
[43] Paradigm and performance analysis of distributed frequent itemset mining algorithms based on Mapreduce
Xiao, Wen
Hu, Juan
MICROPROCESSORS AND MICROSYSTEMS, 2021, 82
[44] Research on the four kinds of frequent itemset mining algorithms in market basket analysis
Feng, L
Chen, HS
CONCURRENT ENGINEERING: THE WORLDWIDE ENGINEERING GRID, PROCEEDINGS, 2004, : 329 - 332
[45] New Spark solutions for distributed frequent itemset and association rule mining algorithms
Carlos Fernandez-Basso
M. Dolores Ruiz
Maria J. Martin-Bautista
Cluster Computing, 2024, 27 : 1217 - 1234
[46] A Survey Paper on Frequent Itemset Mining
Sastry, J. S. V. R. S.
Suresh, V
INTERNATIONAL CONFERENCE ON COMPUTER VISION AND MACHINE LEARNING, 2019, 1228
[47] Frequent Itemset Mining in Multirelational Databases
Jimenez, Aida
Berzal, Fernando
Cubero, Juan-Carlos
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 15 - 24
[48] Verified Programs for Frequent Itemset Mining
Loulergue, Frederic
Whitney, Christopher D.
2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 1516 - 1523
[49] A primer to frequent itemset mining for bioinformatics
Naulaerts, Stefan
Meysman, Pieter
Bittremieux, Wout
Trung Nghia Vu
Vanden Berghe, Wim
Goethals, Bart
Laukens, Kris
BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) : 216 - 231
[50] Oracle and Vertica for Frequent Itemset Mining
Kyurkchiev, Hristo
Kaloyanova, Kalinka
DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 77 - 85

← 1 2 3 4 5 →