Convergence of Alternating Gradient Descent for Matrix Factorization

被引：0

作者：

Ward, Rachel ^{[1
]}

Kolda, Tamara G. ^{[2
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

[2] MathSci Ai, Dublin, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-r matrix A is an element of R-mxn, T = C(sigma(1)(A)/sigma(r)(A))(2) log(1/epsilon) iterations of alternating gradient descent suffice to reach an epsilon-optimal factorization parallel to A - X-T Y-T(inverted perpendicular)parallel to(2)(F) <= parallel to A parallel to(2)(F) with high probability starting from an atypical random initialization. The factors have rank d >= r so that X-T is an element of R-mxd and Y-T is an element of R-nxd, and mild overparameterization suffices for the constant C in the iteration complexity T to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. Our proof is conceptually simple: a uniform Polyak-Lojasiewicz (PL) inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.

引用

页数：14

共 50 条

[41] Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
De Sa, Christopher
Olukotun, Kunle
Re, Christopher
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2332 - 2341
[42] Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent
Dong, Wei
Liu, Hongzhen
MATHEMATICS, 2024, 12 (05)
[43] A Sharp Analysis of Covariate Adjusted Precision Matrix Estimation via Alternating Projected Gradient Descent
Lv, Xiao
Cui, Wei
Liu, Yulong
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 877 - 881
[44] On Faster Convergence of Scaled Sign Gradient Descent
Li, Xiuxian
Lin, Kuo-Yi
Li, Li
Hong, Yiguang
Chen, Jie
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (02) : 1732 - 1741
[45] Convergence of gradient descent algorithm for a recurrent neuron
Xu, Dongpo
Li, Zhengxue
Wu, Wei
Ding, Xiaoshuai
Qu, Di
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 117 - +
[46] On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow
Mroueh, Youssef
Truyen Nguyen
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[47] On the convergence and improvement of stochastic normalized gradient descent
Shen-Yi ZHAO
Yin-Peng XIE
Wu-Jun LI
ScienceChina(InformationSciences), 2021, 64 (03) : 105 - 117
[48] Linear Convergence of Adaptive Stochastic Gradient Descent
Xie, Yuege
Wu, Xiaoxia
Ward, Rachel
arXiv, 2019,
[49] Convergence analysis of gradient descent stochastic algorithms
Shapiro, A
Wardi, Y
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
[50] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Li, Xiaoyu
Orabona, Francesco
22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89

← 1 2 3 4 5 →