Positive and negative association rule mining in Hadoop's MapReduce environment

被引:15
作者
Bagui, Sikha [1 ]
Dhar, Probal Chandra [1 ]
机构
[1] Univ West Florida, Dept Comp Sci, Pensacola, FL 32514 USA
关键词
Positive association rule mining; Negative association rule mining; Hadoop; MapReduce; Apriori; Big data; Frequent itemset mining; Parallel environment; Hadoop's Distributed File System (HDFS);
D O I
10.1186/s40537-019-0238-8
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we present a Hadoop implementation of the Apriori algorithm. Using Hadoop's distributed and parallel MapReduce environment, we present an architecture to mine positive as well as negative association rules in big data using frequent itemset mining and the Apriori algorithm. We also analyze and present the results of a few optimization parameters in Hadoop's MapReduce environment as it relates to this algorithm. The results are presented based on the number of rules generated as well as the run-time efficiency. We find that, a higher amount of parallelization, which means larger block sizes, will increase the run-time efficiency of the Hadoop implementation of the Apriori algorithm.
引用
收藏
页数:16
相关论文
共 34 条
[1]  
Aggarwal C. C., 1998, Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. PODS 1998, P18, DOI 10.1145/275487.275490
[2]   Mining associations with the collective strength approach [J].
Aggarwal, CC ;
Yu, PS .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2001, 13 (06) :863-873
[3]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[4]  
Agrawal R., 1994, INT C VER LARG DAT B, P487
[5]  
Antonie ML, 2004, LECT NOTES ARTIF INT, V3202, P27
[6]  
Bagui Sikha, 2009, International Journal of Data Analysis Techniques and Strategies, V1, P297, DOI 10.1504/IJDATS.2009.024297
[7]  
Bala P, 2009, P 8 ATM R D SEMINARS, p1e8
[8]  
Brin S., 1997, P 1997 ACM SIGMOD IN, P265
[9]  
Cornelis C, 2006, CONF CYBERN INTELL S, P152
[10]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137