HashEclat: an efficient frequent itemset algorithm

被引:26
作者
Zhang, Chunkai [1 ]
Tian, Panbo [1 ]
Zhang, Xudong [1 ]
Liao, Qing [1 ]
Jiang, Zoe L. [1 ]
Wang, Xuan [1 ]
机构
[1] Harbin Inst Technol Shenzhen, Dept Comp Sci & Technol, Shenzhen, Peoples R China
关键词
Frequent itemset; MinHash; Approximate algorithm; Eclat;
D O I
10.1007/s13042-018-00918-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Eclat algorithm is one of the most widely used frequent itemset mining methods. However, the inefficiency for calculating the intersection of itemsets makes it a time-consuming method, especially when the dataset has a large number of transactions. In this work, for the purpose of efficiency improvement, we proposed an approximate Eclat algorithm named HashEclat based on MinHash, which could quickly estimate the size of the intersection set, and adjust the parameters k, E and minSup to consider the tradeoff between accuracy of the mining results and execution time. The parameter k is the top-k parameter of one-permutation MinHash algorithm; the parameter E is the estimate error of one intersection size; the parameter minSup is the minimum support threshold. In many real situations, an approximate result with faster speed maybe more useful than 'exact' result. The theoretical analysis and experiment results that we present demonstrate that the proposed algorithm can output almost all of the frequent itemset with faster speed and less memory space.
引用
收藏
页码:3003 / 3016
页数:14
相关论文
共 34 条
[21]   Is Min-Wise Hashing Optimal for Summarizing Set Intersection? [J].
Pagh, Rasmus ;
Stockel, Morten ;
Woodruff, David P. .
PODS'14: PROCEEDINGS OF THE 33RD ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2014, :109-120
[22]   Strongly correlated quantum walks in optical lattices [J].
Preiss, Philipp M. ;
Ma, Ruichao ;
Tai, M. Eric ;
Lukin, Alexander ;
Rispoli, Matthew ;
Zupancic, Philip ;
Lahini, Yoav ;
Islam, Rajibul ;
Greiner, Markus .
SCIENCE, 2015, 347 (6227) :1229-1233
[23]  
Szmit Radoslaw, 2013, Language Processing and Intelligent Information Systems. 20th International Conference, IIS 2013. Proceedings. LNCS 7912, P171, DOI 10.1007/978-3-642-38634-3_19
[24]  
Teschner M, 2003, VISION, MODELING, AND VISUALIZATION 2003, P47
[25]   Incorporating Diversity and Informativeness in Multiple-Instance Active Learning [J].
Wang, Ran ;
Wang, Xi-Zhao ;
Kwong, Sam ;
Xu, Chen .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) :1460-1475
[26]   Discovering the Relationship Between Generalization and Uncertainty by Incorporating Complexity of Classification [J].
Wang, Xi-Zhao ;
Wang, Ran ;
Xu, Chen .
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (02) :703-715
[27]   A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning [J].
Wang, Xi-Zhao ;
Xing, Hong-Jie ;
Li, Yan ;
Hua, Qiang ;
Dong, Chun-Ru ;
Pedrycz, Witold .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2015, 23 (05) :1638-1654
[28]   A New Approach to Classifier Fusion Based on Upper Integral [J].
Wang, Xi-Zhao ;
Wang, Ran ;
Feng, Hui-Min ;
Wang, Hua-Chao .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (05) :620-635
[29]   Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes [J].
Wang, Xi-Zhao ;
He, Yu-Lin ;
Wang, Debby D. .
IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (01) :21-39
[30]   Design of the fuzzy sliding mode controller for DC motor [J].
Wen, Chih-Chin ;
Chung, Chien-Wen ;
Wang, Hui-Min ;
Chang, Yaote .
2013 SECOND INTERNATIONAL CONFERENCE ON ROBOT, VISION AND SIGNAL PROCESSING (RVSP), 2013, :196-199