Learning-Rate-Free Momentum SGD with Reshuffling Converges in Nonsmooth Nonconvex Optimization

被引：0

作者：

Hu, Xiaoyin ^{[1
,2
]}

Xiao, Nachuan ^{[3
]}

Liu, Xin ^{[4
,5
]}

Toh, Kim-Chuan ^{[6
,7
]}

机构：

[1] Hangzhou City Univ, Sch Comp & Comp Sci, Hangzhou 310015, Peoples R China

[2] Hangzhou City Univ, Acad Edge Intelligence, Hangzhou, Peoples R China

[3] Natl Univ Singapore, Inst Operat Res & Analyt, Singapore, Singapore

[4] Chinese Acad Sci, Acad Math & Syst Sci, State Key Lab Sci & Engn Comp, Beijing, Peoples R China

[5] Univ Chinese Acad Sci, Beijing, Peoples R China

[6] Natl Univ Singapore, Dept Math, Singapore 119076, Singapore

[7] Natl Univ Singapore, Inst Operat Res & Analyt, Singapore 119076, Singapore

来源：

JOURNAL OF SCIENTIFIC COMPUTING | 2025年 / 102卷 / 03期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Nonsmooth optimization; Stochastic subgradient methods; Nonconvex optimization; Learning-rate free; Differential inclusion; MODEL;

D O I：

10.1007/s10915-025-02798-0

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this paper, we propose a generalized framework for developing learning-rate-free momentum stochastic gradient descent (SGD) methods in the minimization of nonsmooth nonconvex functions, especially in training nonsmooth neural networks. Our framework adaptively generates learning rates based on the historical data of stochastic subgradients and iterates. Under mild conditions, we prove that our proposed framework enjoys global convergence to the stationary points of the objective function in the sense of the conservative field, hence providing convergence guarantees for training nonsmooth neural networks. Based on our proposed framework, we propose a novel learning-rate-free momentum SGD method (LFM). Preliminary numerical experiments reveal that LFM performs comparably to the state-of-the-art learning-rate-free methods (which have not been shown theoretically to be convergent) across well-known neural network training benchmarks.

引用

页数：31

共 57 条

[41]

Orabona F, 2017, ADV NEUR IN, V30

[42]

Orabona F, 2016, ADV NEUR IN, V29

[43]

Orabona F, 2014, ADV NEUR IN, V27

[44] Conservative Parametric Optimality and the Ridge Method for Tame Min-Max Problems [J].

Pauwels, Edouard .

SET-VALUED AND VARIATIONAL ANALYSIS, 2023, 31 (03)

[45] Incremental Without Replacement Sampling in Nonconvex Optimization [J].

Pauwels, Edouard .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2021, 190 (01) :274-299

[46]

Polyak B., 1964, USSR Computational Mathematics and Mathematical Physics, V4, P1

[47] Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization [J].

Ruszczynski, Andrzej .

OPTIMIZATION LETTERS, 2020, 14 (07) :1615-1625

[48]

Streeter M, 2012, Arxiv, DOI arXiv:1211.2260

[49] Chromosome Detection in Metaphase Cell Images Using Morphological Priors [J].

Wang, Jun ;

Zhou, Chengfeng ;

Chen, Songchang ;

Hu, Jianwu ;

Wu, Minghui ;

Jiang, Xudong ;

Xu, Chenming ;

Qian, Dahong .

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (09) :4579-4590

[50]

Ward R, 2020, J MACH LEARN RES, V21

← 1 2 3 4 5 6 →