Sequence-Growth : A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework

被引:18
作者
Liang, Yen-hui [1 ]
Wu, Shiow-yang [1 ]
机构
[1] Natl Dong Hwa Univ, Dept Comp Sci & Informat Engn, Hualien, Taiwan
来源
2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015 | 2015年
关键词
Frequent Pattern Mining; MapReduce; Big Data; Scalability; Efficiency; PATTERNS;
D O I
10.1109/BigDataCongress.2015.65
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frequent itemset mining(FIM) is an important research topic because it is widely applied in real world to find the frequent itemsets and to mine human behavior patterns. FIM process is both memory and compute-intensive. As data grows exponentially every day, the problems of efficiency and scalability become more severe. In this paper, we propose a new distributed FIM algorithm, called Sequence-Growth, and implement it on MapReduce framework. Our algorithm applies the idea of lexicographical order to construct a tree, called "lexicographical sequence tree", that allows us to find all frequent itemsets without exhaustive search over the transaction databases. In addition, the breadth-wide support-based pruning strategy is also an important factor to contribute the efficiency and scalability of our algorithm. To test the performances of our algorithm, we conduct varied aspects of experiments on MapReduce framework with large datasets. The results show the good efficiency and scalability of Sequence-Growth especially to deal with big data and long itemsets. Our algorithm also proposes a new mining methodology which can be easily modified for sequential pattern mining, trajectory pattern mining and other associate rule mining algorithms. We believe that it should have a valuable contribution in the future development of association rule mining algorithms for big data.
引用
收藏
页码:393 / 400
页数:8
相关论文
共 22 条
[1]  
Anastasiu D. C, 2014, FREQUENT PATTERN MIN, P225, DOI DOI 10.1007/978-3-319-07821-2_10
[2]  
[Anonymous], 1994, P 20 INT C VER LARG
[3]  
[Anonymous], P 11 INT C DAT ENG I
[4]  
Chen C., 2013, BIG DAT BIGDATA C 20
[5]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[6]  
Farzanyar Z., 2013, P 2013 IEEE ACM INT
[7]   Accelerating Frequent Itemsets Mining on the Cloud: A Map Reduce -Based Approach [J].
Farzanyar, Zahra ;
Cercone, Nick .
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, :592-598
[8]  
Giannotti F., 2007, P 13 ACM SIGKDD INT
[9]  
HAMMOUD S, 2011, THESIS
[10]  
Han Jiawei, 2000, ACM SIGMOD RECORD, V29