High-Probability Complexity Bounds for Non-smooth Stochastic Convex Optimization with Heavy-Tailed Noise

被引:0
作者
Gorbunov, Eduard [1 ]
Danilova, Marina [2 ,3 ]
Shibaev, Innokentiy [3 ,4 ]
Dvurechensky, Pavel [5 ]
Gasnikov, Alexander [3 ,6 ,7 ]
机构
[1] Mohamed bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[2] RAS, Inst Control Sci, 65 Profsoyuznaya St, Moscow 117997, Russia
[3] Moscow Inst Phys & Technol, 9 Institutskiy per, Dolgoprudnyi 141701, Moscow, Russia
[4] Natl Res Univ Higher Sch Econ, 11 Pokrovsky Blvd, Moscow 109028, Russia
[5] Weierstrass Inst Appl Anal & Stochast, Mohrenstr 39, D-10117 Berlin, Germany
[6] Innopolis Univ, 1 Univ skaya St, Innopolis 420500, Tatarstan, Russia
[7] RAS, Inst Informat Transmiss Problems, 19 Bolshoy Karetny per,build1, Moscow 127051, Russia
关键词
Stochastic optimization; Convex optimization; Non-smooth optimization; High-probability bounds; Heavy-tailed noise; INEQUALITIES; ALGORITHMS;
D O I
10.1007/s10957-024-02533-z
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residuals with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with H & ouml;lder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
引用
收藏
页码:2679 / 2738
页数:60
相关论文
共 44 条
[1]  
Ba J, 2014, ACS SYM SER
[3]   A variational formulation for frame-based inverse problems [J].
Chaux, Caroline ;
Combettes, Patrick L. ;
Pesquet, Jean-Christophe ;
Wajs, Valerie R. .
INVERSE PROBLEMS, 2007, 23 (04) :1495-1518
[4]  
Davis D, 2021, J MACH LEARN RES, V22
[5]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6]  
Devolder O., 2013, THESIS UC LOUVAIN
[7]   First-order methods of smooth convex optimization with inexact oracle [J].
Devolder, Olivier ;
Glineur, Francois ;
Nesterov, Yurii .
MATHEMATICAL PROGRAMMING, 2014, 146 (1-2) :37-75
[8]   Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle [J].
Dvurechensky, Pavel ;
Gasnikov, Alexander .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2016, 171 (01) :121-145
[9]   On Bernstein-type inequalities for martingales [J].
Dzhaparidze, K ;
van Zanten, JH .
STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2001, 93 (01) :109-117
[10]   TAIL PROBABILITIES FOR MARTINGALES [J].
FREEDMAN, DA .
ANNALS OF PROBABILITY, 1975, 3 (01) :100-118