Efficient Mining of Frequent itemsets in Social Network Data based on MapReduce Framework

被引:0
作者
Farzanyar, Zahra [1 ]
Cercone, Nick [1 ]
机构
[1] York Univ, Comp Sci & Engn Dept, Toronto, ON M3J 2R7, Canada
来源
2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM) | 2013年
关键词
Social networks; Frequent Itemset Mining; Cloud Computing; MapReduce;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Social Networks promote information sharing between people everywhere and at all times. Mining data produced in this data-rich environment can be extremely useful. Frequent itemset mining plays an important role in mining associations, correlations, sequential patterns, causality, episodes, multidimensional patterns, max-patterns, partial periodicity, emerging patterns, and many other significant data mining tasks in social networks. With the exponential growth of social network data towards a terabyte or more, most of the traditional frequent itemset mining algorithms become ineffective due to either huge resource requirements or large communications overhead. Cloud computing has proved that processing very large datasets over commodity clusters can be done by providing the right programming model. As a parallel programming model, MapReduce, one of most important techniques for cloud computing, has emerged in the mining of datasets of terabyte scale or larger on clusters of computers. In this paper, we propose an efficient frequent itemset mining algorithm, called IMRApriori, based on MapReduce framework which deals with Hadoop cloud, a parallel store and computing platform. The paper demonstrates experimental results to corroborate the theoretical claims.
引用
收藏
页码:1183 / 1188
页数:6
相关论文
共 21 条
[1]  
Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[2]   Toward Terabyte Pattern Mining An Architecture-conscious Solution [J].
Buehrer, Gregory ;
Parthasarathy, Srinivasan ;
Tatikonda, Shirish ;
Kurc, Tahsin ;
Saltz, Joel .
PROCEEDINGS OF THE 2007 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING PPOPP'07, 2007, :2-12
[3]  
Cong S., 2005, P ACM SIGPLAN 2005 S, P255, DOI DOI 10.1145/1065944.1065979
[4]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[5]  
El-Hajj M., 2006, 12th International Conference on Parallel and Distributed Systems
[6]  
Fang W., 2008, 07 HONG KONG U SCI T
[7]   Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance [J].
Jin, RM ;
Yang, G ;
Agrawal, G .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (01) :71-89
[8]  
Le Zhou, 2010, Proceedings 2010 IEEE Youth Conference on Information, Computing and Telecommunications (YC-ICT 2010), P243, DOI 10.1109/YCICT.2010.5713090
[9]  
Li N, 2013, INT J NETW DISTRIB C, V1, P89, DOI 10.1109/SNPD.2012.31
[10]  
Lin M.-Y., 2012, Proceedings of the 6th international conference on ubiquitous information management and communication, P76