clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

被引:10
作者
Chen, Jing [1 ]
Fang, Jianbin [1 ]
Liu, Weifeng [2 ]
Tang, Tao [1 ]
Yang, Canqun [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2020年 / 108卷
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Matrix factorization; Alternating least squares; Performance; RECOMMENDER; SYSTEMS; MEMORY;
D O I
10.1016/j.future.2018.04.071
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, wediagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8x-15.7x faster on an Intel 16-core CPU, 23.9x-87.9x faster on an NVIDIA K20C GPU and 34.6x-97.1x faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1192 / 1205
页数:14
相关论文
共 60 条
[1]  
Agarwal A, 2012, IEEE DECIS CONTR P, P5451, DOI 10.1109/CDC.2012.6426626
[2]  
[Anonymous], 2016, CUDA C PROGRAMMING G
[3]  
[Anonymous], 2015, Ph.D. Dissertation
[4]  
[Anonymous], 2008, GUIDE SINGULAR VALUE
[5]  
[Anonymous], 2014, THESIS
[6]  
[Anonymous], [No title captured]
[7]  
[Anonymous], [No title captured]
[8]  
[Anonymous], 2015, CORR
[9]  
[Anonymous], 2017, CONCURR COMPUT PRACT
[10]  
[Anonymous], 2012, MATRIX COMPUTATIONS