A new inexact stochastic recursive gradient descent algorithm with Barzilai–Borwein step size in machine learning

被引:0
作者
Yi-ming Yang
Fu-sheng Wang
Jin-xiang Li
Yuan-yuan Qin
机构
[1] Taiyuan Normal University,School of Mathematics and Statistics
来源
Nonlinear Dynamics | 2023年 / 111卷
关键词
Stochastic optimization; Stochastic gradient; Variance reduction; BB method;
D O I
暂无
中图分类号
学科分类号
摘要
The inexact SARAH (iSARAH) algorithm as a variant of SARAH algorithm for variance reduction has recently surged into prominence for solving large-scale optimization problems in the context of machine learning. The performance of the iSARAH significantly depends on the choice of step-size sequence. In this paper, we develop a new algorithm called iSARAH-BB, which employs the Barzilai–Borwein (BB) method to automatically compute step size based on SARAH. By introducing this adaptive step size in the design of the new algorithm, iSARAH-BB can take better advantages of both iSARAH and BB methods. Finally, we analyze the convergence rate and the complexity of the new algorithm under the usual assumptions. Numerical experiments on standard datasets indicate that our proposed iSARAH-BB algorithm is robust to the selection of the initial step size, and it is effective and more competitive than the existing algorithms.
引用
收藏
页码:3575 / 3586
页数:11
相关论文
共 44 条
[1]  
Robbins H(1951)A stochastic approximation method Ann. Math. Stat. 22 400-407
[2]  
Monro S(2008)Performance analysis of stochastic gradient algorithms under weak conditions Sci. China Ser. F: Inf. Sci. 51 1269-1280
[3]  
Ding F(2018)Optimization methods for large-scale machine learning SIAM Rev. 60 223-311
[4]  
Yang HZ(2015)Deep learning Nature 521 436-444
[5]  
Liu F(2017)Minimizing finite sums with the stochastic average gradient Math. Program. 162 83-112
[6]  
Bottou L(2020)Inexact SARAH algorithm for stochastic optimization Optim. Method. Softw. 36 237-258
[7]  
Curtis FE(1998)Online learning and stochastic approximations Online Learn. Neural Netw. 17 9-42
[8]  
Nocedal J(2011)Adaptive subgradient methods for online learning and stochastic optimization J. Mach. Learn. Res. 12 2121-2159
[9]  
LeCun Y(1988)Two-point step size gradient methods IMA J. Numer. Anal. 8 141-148
[10]  
Bengio Y(2002)R-linear convergence of the Barzilai and Borwein gradient method IMA J. Numer. Anal. 22 1-10