Web Log Mining Algorithm Based on Hadoop and MongoDB

被引：0

作者：

Xiao Feng ^{[1
]}

Xie Jian ^{[1
]}

Rong Huigui ^{[1
]}

Huo Shengxu ^{[1
]}

机构：

[1] Hunan Univ, Shengxu Coll Informat Sci & Engn, HUO, Changsha 410082, Hunan, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014) | 2014年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Apriori algorithm is one of the basic algorithms to generate Boolean association rules for mining frequent item sets. But the algorithm itself requires frequent scan of the whole dataset, which always results in I/O bottleneck. To solve this problem, this paper proposes an improved AprioriHM algorithm. The AprioriHM algorithm reduces the frequency of scanning the whole dataset and uses Hadoop to automatically manage the details of parallel processing. Moreover, AprioriHM algorithm uses a distributed database, MongoDB, to relieve I/O bottlenecks. Experimental results show that AprioriHM outperforms the conventional Apriori algorithm.

引用

页码：246 / 253

页数：8