HashEclat: an efficient frequent itemset algorithm

被引：27

作者：

Zhang, Chunkai ^{[1
]}

Tian, Panbo ^{[1
]}

Zhang, Xudong ^{[1
]}

Liao, Qing ^{[1
]}

Jiang, Zoe L. ^{[1
]}

Wang, Xuan ^{[1
]}

机构：

[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2019年 / 10卷 / 11期

关键词：

Frequent itemset; MinHash; Approximate algorithm; Eclat;

D O I：

10.1007/s13042-018-00918-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.

引用

页码：3003 / 3016

页数：14

共 50 条

[21] A New Algorithm for Frequent Itemset Generation in Non-Binary Search Space
Kumar, G. Praveen
Sarkar, Anirban
Debnath, Narayan C.
PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 149 - +
[22] A primer to frequent itemset mining for bioinformatics
Naulaerts, Stefan
Meysman, Pieter
Bittremieux, Wout
Trung Nghia Vu
Vanden Berghe, Wim
Goethals, Bart
Laukens, Kris
BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) : 216 - 231
[23] A Depth-first Algorithm of Finding All Association Rules Generated by a Frequent Itemset
武坤
姜保庆
魏庆
Journal of DongHua University, 2006, (06) : 1 - 4
[24] Frequent Itemset Mining for Big Data
Moens, Sandy
Aksehirli, Emin
Goethals, Bart
2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
[25] DP-Eclat:A Vertical Frequent Itemset Mining Algorithm Based on Differential Privacy
Guan, Shengyi
Ma, Xuebin
Li, Wuyungerile
Bai, Xiangyu
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1696 - 1703
[26] Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering
Yarlagadda, Madhulika
Rao, K. Gangadhara
Srikrishna, A.
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (04) : 1098 - 1109
[27] Frequent Itemset Mining with Local Differential Privacy
Li, Junhui
Gan, Wensheng
Gui, Yijie
Wu, Yongdong
Yu, Philip S.
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1146 - 1155
[28] Research on Distributed Text Clustering Based on Frequent Itemset
Yang, Wenchuan
Wu, Qiwei
Cheng, Zishuai
PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 5700 - 5705
[29] Frequent Itemset Generation using Double Hashing Technique
Jayalakshmi, N.
Vidhya, V.
Krishnamurthy, M.
Kannan, A.
INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1467 - 1478
[30] Frequent Itemset Mining for a Combination of Certain and Uncertain Databases
Wazir, Samar
Ahmad, Tanvir
Beg, M. M. Sufyan
RECENT DEVELOPMENTS AND THE NEW DIRECTION IN SOFT-COMPUTING FOUNDATIONS AND APPLICATIONS, 2018, 361 : 25 - 39

← 1 2 3 4 5 →