HashEclat: an efficient frequent itemset algorithm

被引:27
|
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 50 条
  • [41] HBPFP-DC: A parallel frequent itemset mining using Spark
    Xun, Yaling
    Zhang, Jifu
    Yang, Haifeng
    Qin, Xiao
    PARALLEL COMPUTING, 2021, 101
  • [42] A New DataStructure For Finding Maximum Frequent ItemSet in Online Data Mining
    Yadav, Lakhan
    Nair, Pramod S.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONTROL (IC4), 2015,
  • [43] Resource Adaptive Technique for Frequent Itemset Mining in Transactional Data Streams
    Chandrika, J.
    Kumar, K. R. Ananda
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2012, 12 (10): : 87 - 92
  • [44] Visual interface for online watching of frequent itemset generation in Apriori and Eclat
    Mahanti, A
    Alhajj, R
    ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 404 - 409
  • [45] Association rule mining based fuzzy manta ray foraging optimization algorithm for frequent itemset generation from social media
    Lakshmi, N.
    Krishnamurthy, M.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
  • [46] Efficient Algorithms for mining High Utility Itemset
    Ambulkar, Snehal D.
    Chatur, Prashant N.
    2017 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRICAL, ELECTRONICS AND COMPUTING TECHNOLOGIES (ICRTEECT), 2017, : 150 - 154
  • [47] Research on Frequent Itemset Mining of Imaging Genetics GWAS in Alzheimer's Disease
    Liang, Hong
    Cao, Luolong
    Gao, Yue
    Luo, Haoran
    Meng, Xianglian
    Wang, Ying
    Li, Jin
    Liu, Wenjie
    GENES, 2022, 13 (02)
  • [48] New Spark solutions for distributed frequent itemset and association rule mining algorithms
    Fernandez-Basso, Carlos
    Ruiz, M. Dolores
    Martin-Bautista, Maria J.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (02): : 1217 - 1234
  • [49] New Spark solutions for distributed frequent itemset and association rule mining algorithms
    Carlos Fernandez-Basso
    M. Dolores Ruiz
    Maria J. Martin-Bautista
    Cluster Computing, 2024, 27 : 1217 - 1234
  • [50] Very Fast Frequent Itemset Mining Simplicial Complex Methods (Extended Abstract)
    Lin, Tsau-Young
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1946 - 1949