Accelerating Matrix Factorization by Overparameterization

被引：12

作者：

Chen, Pu ^{[1
]}

Chen, Hung-Hsuan ^{[1
]}

机构：

[1] Natl Cent Univ, Comp Sci & Informat Engn, Taoyuan, Taiwan

来源：

PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA) | 2020年

关键词：

Matrix Factorization; Collaborative Filtering; SVD; Recommender Systems; Overparameterization;

D O I：

10.5220/0009885600890097

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper studies overparameterization on the matrix factorization (MF) model. We confirm that overparameterization can significantly accelerate the optimization of MF with no change in the expressiveness of the learning model. Consequently, modern applications on recommendations based on MF or its variants can largely benefit from our discovery. Specifically, we theoretically derive that applying the vanilla stochastic gradient descent (SGD) on the overparameterized MF model is equivalent to employing gradient descent with momentum and adaptive learning rate on the standard MF model. We empirically compare the overparameterized MF model with the standard MF model based on various optimizers, including vanilla SGD, AdaGrad, Adadelta, RMSprop, and Adam, using several public datasets. The experimental results comply with our analysis - overparameterization converges faster. The overparameterization technique can be applied to various learning-based recommendation models, including deep learning-based recommendation models, e.g., SVD++, nonnegative matrix factorization (NMF), factorization machine (FM), NeuralCF, Wide&Deep, and DeepFM. Therefore, we suggest utilizing the overparameterization technique to accelerate the training speed for the learning-based recommendation models whenever possible, especially when the size of the training dataset is large.

引用

页码：89 / 97

页数：9

共 35 条

[1]

ARORA S, 2018, P MACHINE LEARNING R, V80

[2] On adaptive learning rate that guarantees convergence in feedforward networks [J].

Behera, Laxmidhar ;

Kumar, Swagat ;

Patnaik, Awhan .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2006, 17 (05) :1116-1125

[3]

Chen H.-H, 2019, ACM T KNOWL DISCOV D, V13

[4]

Chen HH, 2017, Arxiv, DOI arXiv:1710.00482

[5] Behavior2Vec: Generating Distributed Representations of Users' Behaviors on Products for Recommender Systems [J].

Chen, Hung-Hsuan .

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (04)

[6]

Cheng Heng-Tze, P 1 WORKSH DEEP LEAR, pUSA, DOI DOI 10.1145/2988450.2988454

[7]

Zeiler MD, 2012, Arxiv, DOI arXiv:1212.5701

[8]

Du Simon S., 2019, P INT C LEARN REPR

[9]

Duchi J, 2011, J MACH LEARN RES, V12, P2121

[10]

Graves A, 2014, Arxiv, DOI [arXiv:1308.0850, 10.48550/arXiv.1308.0850]

← 1 2 3 4 →