clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization

被引:10
|
作者
Chen, Jing [1 ]
Fang, Jianbin [1 ]
Liu, Weifeng [2 ]
Tang, Tao [1 ]
Yang, Canqun [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, Trondheim, Norway
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Matrix factorization; Alternating least squares; Performance; RECOMMENDER; SYSTEMS; MEMORY;
D O I
10.1016/j.future.2018.04.071
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Alternating least squares (ALS) has been proved to be an effective solver for matrix factorization in recommender systems. To speed up factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-cores and many-cores. Existing implementations are limited in either speed or portability. In this paper, we present an efficient and portable ALS solver (clMF) for recommender systems. On one hand, wediagnose the baseline implementation and observe that it lacks of the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique, the fine-grained tiling technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently map it to the underlying hardware. The experimental results show that our implementation performs 2.8x-15.7x faster on an Intel 16-core CPU, 23.9x-87.9x faster on an NVIDIA K20C GPU and 34.6x-97.1x faster on an AMD Fury X GPU than the baseline implementation. On the K20C GPU, our implementation also outperforms cuMF over different latent features ranging from 10 to 100 with various real-world recommendation datasets. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:1192 / 1205
页数:14
相关论文
共 50 条
  • [1] BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs
    Chen, Jing
    Fang, Jianbin
    Liu, Weifeng
    Yang, Canqun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (09) : 2291 - 2302
  • [2] FINE-GRAINED PARALLEL INCOMPLETE LU FACTORIZATION
    Chow, Edmond
    Patel, Aftab
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (02): : C169 - C193
  • [3] Convergence of a Fast Hierarchical Alternating Least Squares Algorithm for Nonnegative Matrix Factorization
    Hou, Liangshao
    Chu, Delin
    Liao, Li-Zhi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 77 - 89
  • [4] Hybrid projective nonnegative matrix factorization based on α-divergence and the alternating least squares algorithm
    Belachew, Melisew Tefera
    Del Buono, Nicoletta
    APPLIED MATHEMATICS AND COMPUTATION, 2020, 369
  • [5] A coarse-grained parallel QR-factorization algorithm for sparse least squares problems
    Ostromsky, T
    Hansen, PC
    Zlatev, Z
    PARALLEL COMPUTING, 1998, 24 (5-6) : 937 - 964
  • [6] A FINE-GRAINED PARALLEL MEMORY COMPACTION ALGORITHM
    WEEMEEUW, P
    DEMOEN, B
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1994, 20 (02) : 176 - 186
  • [7] Self-stabilizing fine-grained parallel incomplete LU factorization
    Coleman, Evan
    Sosonkinab, Masha
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 19 : 291 - 304
  • [8] A Progressive Hierarchical Alternating Least Squares Method for Symmetric Nonnegative Matrix Factorization
    Hou, Liangshao
    Chu, Delin
    Liao, Li-Zhi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5355 - 5369
  • [9] Fine-grained parallel genetic algorithm: A global convergence criterion
    Muhammad, A
    Bargiela, A
    King, G
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1999, 73 (02) : 139 - 155
  • [10] Fine-grained parallel genetic algorithm: A global convergence criterion
    Southampton Institute, Southampton, United Kingdom
    不详
    Int J Comput Math, 2 (139-155):