Learning-Rate-Free Momentum SGD with Reshuffling Converges in Nonsmooth Nonconvex Optimization

被引:0
作者
Hu, Xiaoyin [1 ,2 ]
Xiao, Nachuan [3 ]
Liu, Xin [4 ,5 ]
Toh, Kim-Chuan [6 ,7 ]
机构
[1] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310015, Peoples R China
[2] Hangzhou City Univ, Acad Edge Intelligence, Hangzhou, Peoples R China
[3] Natl Univ Singapore, Inst Operat Res & Analyt, Singapore, Singapore
[4] Chinese Acad Sci, Acad Math & Syst Sci, State Key Lab Sci & Engn Comp, Beijing, Peoples R China
[5] Univ Chinese Acad Sci, Beijing, Peoples R China
[6] Natl Univ Singapore, Dept Math, Singapore 119076, Singapore
[7] Natl Univ Singapore, Inst Operat Res & Analyt, Singapore 119076, Singapore
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Nonsmooth optimization; Stochastic subgradient methods; Nonconvex optimization; Learning-rate free; Differential inclusion; MODEL;
D O I
10.1007/s10915-025-02798-0
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper, we propose a generalized framework for developing learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our proposed framework enjoys global convergence to the stationary points of the objective function in the sense of the conservative field, hence providing convergence guarantees for training nonsmooth neural networks. Based on our proposed framework, we propose a novel learning-rate-free momentum SGD method (LFM). Preliminary numerical experiments reveal that LFM performs comparably to the state-of-the-art learning-rate-free methods (which have not been shown theoretically to be convergent) across well-known neural network training benchmarks.
引用
收藏
页数:31
相关论文
共 57 条
[1]  
Aubin J., 1984, Differential inclusions, set-valued maps and viability theory
[2]   Stochastic approximations and differential inclusions [J].
Benaïm, M ;
Hofbauer, J ;
Sorin, S .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2005, 44 (01) :328-348
[3]  
Benaim M., 2006, SEMINAIRE PROBABILIT, P1
[4]  
Bhaskara A, 2020, PR MACH LEARN RES, V119
[5]   Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions [J].
Bianchi, Pascal ;
Hachem, Walid ;
Schechtman, Sholom .
SET-VALUED AND VARIATIONAL ANALYSIS, 2022, 30 (03) :1117-1147
[6]  
BIERSTONE E, 1988, PUBL MATH-PARIS, P5
[7]  
BOLTE J., 2020, Advances in Neural Information Processing Systems, V33, P10809
[8]   Clarke subgradients of stratifiable functions [J].
Bolte, Jerome ;
Daniilidis, Aris ;
Lewis, Adrian ;
Shiota, Masahiro .
SIAM JOURNAL ON OPTIMIZATION, 2007, 18 (02) :556-572
[9]   Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning [J].
Bolte, Jerome ;
Pauwels, Edouard .
MATHEMATICAL PROGRAMMING, 2021, 188 (01) :19-51
[10]  
Bolte Jerome, 2021, Advances in Neural Information Processing Systems, V34