Convergence of Alternating Gradient Descent for Matrix Factorization

被引:0
|
作者
Ward, Rachel [1 ]
Kolda, Tamara G. [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] MathSci Ai, Dublin, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-r matrix A is an element of R-mxn, T = C(sigma(1)(A)/sigma(r)(A))(2) log(1/epsilon) iterations of alternating gradient descent suffice to reach an epsilon-optimal factorization parallel to A - X-T Y-T(inverted perpendicular)parallel to(2)(F) <= parallel to A parallel to(2)(F) with high probability starting from an atypical random initialization. The factors have rank d >= r so that X-T is an element of R-mxd and Y-T is an element of R-nxd, and mild overparameterization suffices for the constant C in the iteration complexity T to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly improves the convergence rate of gradient descent in practice. Our proof is conceptually simple: a uniform Polyak-Lojasiewicz (PL) inequality and uniform Lipschitz smoothness constant are guaranteed for a sufficient number of iterations, starting from our random initialization. Our proof method should be useful for extending and simplifying convergence analyses for a broader class of nonconvex low-rank factorization problems.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
    De Sa, Christopher
    Olukotun, Kunle
    Re, Christopher
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2332 - 2341
  • [42] Distributed Sparse Precision Matrix Estimation via Alternating Block-Based Gradient Descent
    Dong, Wei
    Liu, Hongzhen
    MATHEMATICS, 2024, 12 (05)
  • [43] A Sharp Analysis of Covariate Adjusted Precision Matrix Estimation via Alternating Projected Gradient Descent
    Lv, Xiao
    Cui, Wei
    Liu, Yulong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 877 - 881
  • [44] On Faster Convergence of Scaled Sign Gradient Descent
    Li, Xiuxian
    Lin, Kuo-Yi
    Li, Li
    Hong, Yiguang
    Chen, Jie
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (02) : 1732 - 1741
  • [45] Convergence of gradient descent algorithm for a recurrent neuron
    Xu, Dongpo
    Li, Zhengxue
    Wu, Wei
    Ding, Xiaoshuai
    Qu, Di
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 3, PROCEEDINGS, 2007, 4493 : 117 - +
  • [46] On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow
    Mroueh, Youssef
    Truyen Nguyen
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [47] On the convergence and improvement of stochastic normalized gradient descent
    Shen-Yi ZHAO
    Yin-Peng XIE
    Wu-Jun LI
    ScienceChina(InformationSciences), 2021, 64 (03) : 105 - 117
  • [48] Linear Convergence of Adaptive Stochastic Gradient Descent
    Xie, Yuege
    Wu, Xiaoxia
    Ward, Rachel
    arXiv, 2019,
  • [49] Convergence analysis of gradient descent stochastic algorithms
    Shapiro, A
    Wardi, Y
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1996, 91 (02) : 439 - 454
  • [50] On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
    Li, Xiaoyu
    Orabona, Francesco
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89