An Efficient MapReduce-Based Parallel Processing Framework for User-Based Collaborative Filtering

被引:9
作者
Jeong, Hanjo [1 ]
Cha, Kyung Jin [2 ]
机构
[1] Kwangwoon Univ, Sch Informat Convergence, Seoul 01897, South Korea
[2] Kangwon Univ, Dept Business Adm, Chunchon 24341, South Korea
来源
SYMMETRY-BASEL | 2019年 / 11卷 / 06期
关键词
MapReduce; collaborative filtering; parallel processing; hadoop; recommendation system;
D O I
10.3390/sym11060748
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
User-based collaborative filtering is one of the most-used methods for the recommender systems. However, it takes time to perform the method because it requires a full scan of the entire data to find the neighboring users of each active user, who have similar rating patterns. It also requires time-consuming computations because of the complexity of the algorithms. Furthermore, the amount of rating data in the recommender systems grows rapidly, as the number of users, items, and their rating activities tend to increase. Thus, a big data framework with parallel processing, such as Hadoop, is needed for the recommender systems. There are already many research studies on the MapReduce-based parallel processing method for collaborative filtering. However, most of the research studies have not considered the sequential-access restriction for executing MapReduce jobs and the minimization of the required full scan on the entire data on the Hadoop Distributed File System (HDFS), because HDFS sequentially access data on the disk. In this paper, we introduce an efficient MapReduce-based parallel processing framework for collaborative filtering method that requires only a one-time parallelized full scan, while adhering to the sequential access patterns on Hadoop data nodes. Our proposed framework contains a novel MapReduce framework, including a partial computation framework for calculating the predictions and finding the recommended items for an active user with such a one-way parallelized scan. Lastly, we have used the MovieLens dataset to show the validity of our proposed method, mainly in terms of the efficiency of the parallelized method.
引用
收藏
页数:8
相关论文
共 17 条
[1]   Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions [J].
Adomavicius, G ;
Tuzhilin, A .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (06) :734-749
[2]  
[Anonymous], 2015, Hadoop-The Definitive Guide: Storage and Analysis at Internet Scale
[3]  
Bell RM., 2007, KDD CUP WORKSHOP 13, P7, DOI DOI 10.1007/S007790170019
[4]  
Borthakur D, 2007, The Hadoop Distributed File System: Architecture and Design, V11, P21
[5]  
Cai R, 2016, INT SYM COMPUT INTEL, P370, DOI [10.1109/ISCID.2016.2094, 10.1109/ISCID.2016.199]
[6]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[7]  
Diedhiou C, 2018, INT SYM COMP ARCHIT, P380, DOI [10.1109/SBAC-PAD.2018.00066, 10.1109/CAHPC.2018.8645926]
[8]  
Gupta AK., 2018, ADV BIG DATA CLOUD C, P101, DOI [10.1007/978-981-10-7200-0_9, DOI 10.1007/978-981-10-7200-0_9]
[9]   Parallel and Distributed Collaborative Filtering: A Survey [J].
Karydi, Efthalia ;
Margaritis, Konstantinos .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[10]   GroupLens: Applying collaborative filtering to Usenet news [J].
Konstan, JA ;
Miller, BN ;
Maltz, D ;
Herlocker, JL ;
Gordon, LR ;
Riedl, J .
COMMUNICATIONS OF THE ACM, 1997, 40 (03) :77-87