HashEclat: an efficient frequent itemset algorithm

被引:27
|
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 50 条
  • [21] A New Algorithm for Frequent Itemset Generation in Non-Binary Search Space
    Kumar, G. Praveen
    Sarkar, Anirban
    Debnath, Narayan C.
    PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 149 - +
  • [22] A primer to frequent itemset mining for bioinformatics
    Naulaerts, Stefan
    Meysman, Pieter
    Bittremieux, Wout
    Trung Nghia Vu
    Vanden Berghe, Wim
    Goethals, Bart
    Laukens, Kris
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) : 216 - 231
  • [23] A Depth-first Algorithm of Finding All Association Rules Generated by a Frequent Itemset
    武坤
    姜保庆
    魏庆
    Journal of DongHua University, 2006, (06) : 1 - 4
  • [24] Frequent Itemset Mining for Big Data
    Moens, Sandy
    Aksehirli, Emin
    Goethals, Bart
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [25] DP-Eclat:A Vertical Frequent Itemset Mining Algorithm Based on Differential Privacy
    Guan, Shengyi
    Ma, Xuebin
    Li, Wuyungerile
    Bai, Xiangyu
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1696 - 1703
  • [26] Frequent itemset-based feature selection and Rider Moth Search Algorithm for document clustering
    Yarlagadda, Madhulika
    Rao, K. Gangadhara
    Srikrishna, A.
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (04) : 1098 - 1109
  • [27] Frequent Itemset Mining with Local Differential Privacy
    Li, Junhui
    Gan, Wensheng
    Gui, Yijie
    Wu, Yongdong
    Yu, Philip S.
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1146 - 1155
  • [28] Research on Distributed Text Clustering Based on Frequent Itemset
    Yang, Wenchuan
    Wu, Qiwei
    Cheng, Zishuai
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 5700 - 5705
  • [29] Frequent Itemset Generation using Double Hashing Technique
    Jayalakshmi, N.
    Vidhya, V.
    Krishnamurthy, M.
    Kannan, A.
    INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1467 - 1478
  • [30] Frequent Itemset Mining for a Combination of Certain and Uncertain Databases
    Wazir, Samar
    Ahmad, Tanvir
    Beg, M. M. Sufyan
    RECENT DEVELOPMENTS AND THE NEW DIRECTION IN SOFT-COMPUTING FOUNDATIONS AND APPLICATIONS, 2018, 361 : 25 - 39