Parallel Adaptive Stochastic Gradient Descent Algorithms for Latent Factor Analysis of High-Dimensional and Incomplete Industrial Data

被引：4

作者：

Qin, Wen ^{[1
,2
]}

Luo, Xin ^{[3
]}

Li, Shuai ^{[4
,5
]}

Zhou, MengChu ^{[6
,7
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China

[2] Chinese Acad Sci, Chongqing Inst Green & Intelligent Technol, Chongqing 400714, Peoples R China

[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China

[4] Univ Oulu, Fac Informat Technol & Elect Engn, Oulu 90570, Finland

[5] Technol Res Ctr Finland VTT, Oulu 90570, Finland

[6] Zhejiang Gongshang Univ, Sch Informat & Elect Engn, Hangzhou 310018, Peoples R China

[7] New Jersey Inst Technol, Helen & John C Hartmann Dept Elect & Comp Engn, Newark, NJ 07102 USA

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2024年 / 21卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Adaptation models; Training; Data models; Convergence; Stochastic processes; Sparse matrices; Tuning; Big data; latent factor analysis; Index Terms; adaptive model; parallelization; machine learning; stochastic gradient descent; high-dimensional and incomplete matrix; MATRIX FACTORIZATION; SIDE INFORMATION; RECOMMENDATION; OPTIMIZATION; NETWORK;

D O I：

10.1109/TASE.2023.3267609

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Latent factor analysis (LFA) is efficient in knowledge discovery from a high-dimensional and incomplete (HDI) matrix frequently encountered in industrial big data-related applications. A stochastic gradient descent (SGD) algorithm is commonly adopted as a learning algorithm for LFA owing to its high efficiency. However, its sequential nature makes it less scalable when processing large-scale data. Although alternating SGD decouples an LFA process to achieve parallelization, its performance relies on its hyper-parameters that are highly expensive to tune. To address this issue, this paper presents three extended alternating SGD algorithms whose hyper-parameters are made adaptive through particle swarm optimization. Correspondingly, three Parallel Adaptive LFA (PAL) models are proposed and achieve highly efficient latent factor acquisition from an HDI matrix. Experiments have been conducted on four HDI matrices collected from industrial applications, and the benchmark models are LFA models based on state-of-the-art parallel SGD algorithms including the alternative SGD, Hogwild!, distributed gradient descent, and sparse matrix factorization parallelization. The results demonstrate that compared with the benchmarks, with 32 threads, the proposed PAL models achieve much speedup gain. They achieve the highest prediction accuracy for missing data on most cases. Note to Practitioners-HDI data are commonly encountered in many industrial big data-related applications, where rich knowledge and patterns can be extracted efficiently. An SGD based-LFA model is popular in addressing HDI data due to its efficiency. Yet when dealing with large-scale HDI data, its serial nature greatly reduces its scalability. Although alternating SGD can decouple an LFA process to implement parallelization, its performance depends on its hyper-parameter whose tuning is tedious. To address this vital issue, this study proposes three extended alternating SGD algorithms whose hyper-parameters are made via through a particle swarm optimizer. Based on them, three models are realized, which are able to efficiently obtain latent factors from HDI matrices. Compared with the existing and state-of-the-art models, they enjoy their hyper-parameter-adaptive learning process, as well as highly competitive computational efficiency and representation learning ability. Hence, they provide practitioners with more scalable solutions when addressing large HDI data from industrial applications.

引用

页码：2716 / 2729

页数：14

共 64 条

[1] Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions
Adomavicius, G
Tuzhilin, A
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (06) : 734 - 749
[2] [Anonymous], 2011, PROC 17 ACM SIGKDD I, DOI DOI 10.1145/2020408.2020426
[3] [Anonymous], 2010, Advances in Neural Information Processing Systems
[4] [Anonymous], 2011, P 5 ACM C RECOMMENDE
[5] Beni G., 2020, Complex social and behavioral systems: game theory and agent-based models, P791, DOI [DOI 10.1007/978-3-642-27737-5530-5, DOI 10.1007/978-1-0716-0368-0_530]
[6] Second-order hyperparameter tuning of model-based and adaptive observers for time-varying and unknown chaotic systems
Beyhan, Selami
Cetin, Meric
[J]. CHAOS SOLITONS & FRACTALS, 2022, 156
[7] ON THE USE OF STOCHASTIC HESSIAN INFORMATION IN OPTIMIZATION METHODS FOR MACHINE LEARNING
Byrd, Richard H.
Chin, Gillian M.
Neveitt, Will
Nocedal, Jorge
[J]. SIAM JOURNAL ON OPTIMIZATION, 2011, 21 (03) : 977 - 995
[8] Diversified Personalized Recommendation Optimization Based on Mobile Data
Cao, Bin
Zhao, Jianwei
Lv, Zhihan
Yang, Peng
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (04) : 2133 - 2139
[9] A reduced-order approach to four-dimensional variational data assimilation using proper orthogonal decomposition
Cao, Yanhua
Zhu, Jiang
Navon, I. M.
Luo, Zhendong
[J]. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN FLUIDS, 2007, 53 (10) : 1571 - 1583
[10] Comprehensive Learning Particle Swarm Optimization Algorithm With Local Search for Multimodal Functions
Cao, Yulian
Zhang, Han
Li, Wenfeng
Zhou, Mengchu
Zhang, Yu
Chaovalitwongse, Wanpracha Art
[J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2019, 23 (04) : 718 - 731

← 1 2 3 4 5 6 7 →