Web Log Mining Algorithm Based on Hadoop and MongoDB

被引:0
作者
Xiao Feng [1 ]
Xie Jian [1 ]
Rong Huigui [1 ]
Huo Shengxu [1 ]
机构
[1] Hunan Univ, Shengxu Coll Informat Sci & Engn, HUO, Changsha 410082, Hunan, Peoples R China
来源
INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014) | 2014年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Apriori algorithm is one of the basic algorithms to generate Boolean association rules for mining frequent item sets. But the algorithm itself requires frequent scan of the whole dataset, which always results in I/O bottleneck. To solve this problem, this paper proposes an improved AprioriHM algorithm. The AprioriHM algorithm reduces the frequency of scanning the whole dataset and uses Hadoop to automatically manage the details of parallel processing. Moreover, AprioriHM algorithm uses a distributed database, MongoDB, to relieve I/O bottlenecks. Experimental results show that AprioriHM outperforms the conventional Apriori algorithm.
引用
收藏
页码:246 / 253
页数:8
相关论文
共 8 条
[1]  
Agrawal R, 2011, P 1999 IEEE S SEC PR, V8, P962
[2]  
[Anonymous], 2009, Hadoop: The Definitive Guide
[3]  
Cheung D W, 2010, PAR DISTR INF SYST 2, P31
[4]  
Cheung D. W., 2010, KNOWLEDGE DATA ENG I, V8, P911
[5]  
Jong Kyung Park, 2012, 2012 IEEE Symposium on VLSI Technology, P31, DOI 10.1109/VLSIT.2012.6242446
[6]  
Lee W., 2009, P ACM SIGKDD INT C K, P126
[7]  
Pitkow J E, 2010, WEBVIZ TOOL WORLD WI
[8]  
Wenke Lee, 2011, P 1999 IEEE S SEC PR, P279