Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization

被引:1
作者
Liao, Shichen [1 ]
Liu, Yan [2 ,3 ]
Han, Congying [1 ]
Guo, Tiande [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Math Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China
[2] Nankai Univ, Sch Stat & Data Sci, KLMDASR, LEBPS, 94 Weijin Rd, Tianjin 300071, Peoples R China
[3] Nankai Univ, LPMC, 94 Weijin Rd, Tianjin 300071, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Stochastic Bregman method; Recursive momentum; Variance reduction; Nonconvex nonsmooth optimization; 1ST-ORDER METHODS; MINIMIZATION; CONVERGENCE; CONTINUITY; ALGORITHM; DESCENT;
D O I
10.1016/j.eswa.2024.125960
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In machine learning, it is common to encounter a class of nonconvex, nonsmooth composite optimization problems where the objective function may not possess a globally Lipschitz continuous gradient. To address such issues, we investigate stochastic Bregman proximal gradient methods with recursive momentum, and propose SBPG-STORM, whose convergence in expectation is established under bounded variance assumption. Although this assumption is widely used, it is often not satisfied in many practical applications. One possible method to solve this problem is to use variance reduction estimators, which we have integrated with recursive momentum to propose SBPG-SVRRM and SBPG-SVRRM+. Without assuming bounded variance, we prove the convergence in expectation by establishing their descent Lyapunov functions. Furthermore, under the expectation K & Lstrok; property, we demonstrate the global convergence of the iteration points as well as the convergence speed under different & Lstrok;ojasiewicz exponents. The preliminary numerical results indicate that our proposed methods exhibit comparable performance to the state-of-the-art methods.
引用
收藏
页数:25
相关论文
共 41 条
[1]  
Allen-Zhu Z, 2016, Arxiv, DOI arXiv:1407.1537
[2]  
Allen-Zhu Z, 2018, J MACH LEARN RES, V18
[3]   On the convergence of the proximal algorithm for nonsmooth functions involving analytic features [J].
Attouch, Hedy ;
Bolte, Jerome .
MATHEMATICAL PROGRAMMING, 2009, 116 (1-2) :5-16
[4]   A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications [J].
Bauschke, Heinz H. ;
Bolte, Jerome ;
Teboulle, Marc .
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) :330-348
[5]   Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization [J].
Bianchi, Pascal ;
Jakubowicz, Jeremie .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (02) :391-405
[6]   FIRST ORDER METHODS BEYOND CONVEXITY AND LIPSCHITZ GRADIENT CONTINUITY WITH APPLICATIONS TO QUADRATIC INVERSE PROBLEMS [J].
Bolte, Jerome ;
Sabach, Shoham ;
Teboulle, Marc ;
Vaisbourd, Yakov .
SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (03) :2131-2151
[7]   Proximal alternating linearized minimization for nonconvex and nonsmooth problems [J].
Bolte, Jerome ;
Sabach, Shoham ;
Teboulle, Marc .
MATHEMATICAL PROGRAMMING, 2014, 146 (1-2) :459-494
[8]   SYMMETRIC TENSORS AND SYMMETRIC TENSOR RANK [J].
Comon, Pierre ;
Golub, Gene ;
Lim, Lek-Heng ;
Mourrain, Bernard .
SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2008, 30 (03) :1254-1279
[9]  
Cutkosky A, 2020, Arxiv, DOI arXiv:1905.10018
[10]  
Defazio A, 2014, ADV NEUR IN, V27