An efficient algorithm of frequent itemsets mining based on MapReduce

被引:0
作者
Wang, Le [1 ]
Feng, Lin [1 ]
Zhang, Jing [1 ]
Liao, Pengyu [1 ]
机构
[1] School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering and School of Innovation and Experiment, Dalian University of Technology
来源
Journal of Information and Computational Science | 2014年 / 11卷 / 08期
关键词
Big data; Data mining; Frequent itemsets; Frequent patterns; Mapreduce;
D O I
10.12733/jics20103619
中图分类号
学科分类号
摘要
Mainstream parallel algorithms for mining frequent itemsets (patterns) were designed by implementing FP-Growth or Apriori algorithms on MapReduce (MR) framework. Existing MR FP-Growth algorithms can not distribute data equally among nodes, and MR Apriori algorithms utilize multiple map/reduce procedures and generate too many key-value pairs with value of 1; these disadvantages hinder their performance. This paper proposes an algorithm FIMMR: it firstly mines local frequent itemsets for each data chunk as candidates, applies prune strategies to the candidates, and then identifies global frequent itemsets from candidates. Experimental results show that the time efficiency of FIMMR outperforms PFP and SPC significantly; and under small minimum support threshold, FIMMR can achieve one order of magnitude improvement than the other two algorithms; meanwhile, the speedup of FIMMR is also satisfactory. Copyright © 2014 Binary Information Press.
引用
收藏
页码:2809 / 2816
页数:7
相关论文
共 16 条
[1]  
Feng L., Wang L., Jin B., UT-Tree: Efficient mining of high utility itemsets from data streams, Intelligent Data Analysis, 17, 4, pp. 585-602, (2013)
[2]  
Vo B., Hong T., Le B., DBV-Miner: A Dynamic Bit-Vector approach for fast mining frequent closed itemsets, Expert Systems with Applications, 39, 8, pp. 7196-7206, (2012)
[3]  
Ahmed C.F., Et al., Efficient tree structures for high utility pattern mining in incremental databases, IEEE Transactions on Knowledge and Data Engineering, 21, 12, pp. 1708-1721, (2009)
[4]  
Pei J., Et al., H-Mine: Fast and space-preserving frequent pattern mining in a large databases, HE Transactions (Institute of Industrial Engineers), 39, 6, pp. 593-605, (2007)
[5]  
Burdick D., Et al., MAFIA: A maximal frequent itemset algorithm, IEEE Transactions on Knowledge and Data Engineering, 17, 11, pp. 1490-1504, (2005)
[6]  
Han J., Pei J., Yin Y., Mining frequent patterns without candidate generation, ACM SIGMOD International Conference on Management of Data, (2000)
[7]  
Lin D.I., Kedem Z.M., Pincer search: A new algorithm for discovering the maximum frequent set, Advances in Database Technology, 1377, pp. 103-119, (1998)
[8]  
Agrawal R., Srikant R., Fast algorithms for mining association rules in large databases, International Conference on Very Large Data Bases, (1994)
[9]  
Wang L., Feng L., Jin B., Sliding window-based frequent itemsets mining over data streams using tail pointer table, International Journal of Computational Intelligence Systems, 7, 1, pp. 1-12, (2013)
[10]  
Zhou L., Et al., Balanced parallel FP-growth with mapreduce, 2010 IEEE Youth Conference on Information, Computing and Telecommunications, (2010)