Central limit theorems for stochastic gradient descent with averaging for stable manifolds*

被引：2

作者：

Dereich, Steffen ^{[1
]}

Kassing, Sebastian ^{[2
]}

机构：

[1] Univ Munster, Inst Math Stochast, Fac Math & Comp Sci, Munster, Germany

[2] Univ Bielefeld, Fac Math, Bielefeld, Germany

来源：

ELECTRONIC JOURNAL OF PROBABILITY | 2023年 / 28卷

关键词：

stochastic approximation; Robbins-Monro; Ruppert-Polyak average; deep learning; stable manifold; APPROXIMATION;

D O I：

10.1214/23-EJP947

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In this article, we establish new central limit theorems for Ruppert-Polyak averaged stochastic gradient descent schemes. Compared to previous work we do not assume that convergence occurs to an isolated attractor but instead allow convergence to a stable manifold. On the stable manifold the target function is constant and the oscillations of the iterates in the tangential direction may be significantly larger than the ones in the normal direction. We still recover a central limit theorem for the averaged scheme in the normal direction with the same rates as in the case of isolated attractors. In the setting where the magnitude of the random perturbation is of constant order, our research covers step-sizes -yn = C gamma n-gamma with C gamma > 0 and -y is an element of (34, 1). In particular, we show that the beneficial effect of averaging prevails in more general situations.

引用

页数：48

共 50 条

[1] Stochastic Gradient Descent on Riemannian Manifolds
Bonnabel, Silvere
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2013, 58 (09) : 2217 - 2229
[2] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
Cohen, Kobi
Nedic, Angelia
Srikant, R.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (11) : 5974 - 5981
[3] On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression
Cohen, Kobi
Nedic, Angelia
Srikant, R.
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2314 - 2318
[4] Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification
Jain, Prateek
Netrapalli, Praneeth
Kakade, Sham M.
Kidambi, Rahul
Sidford, Aaron
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 18
[5] CENTRAL LIMIT THEOREMS FOR STOCHASTIC APPROXIMATION WITH CONTROLLED MARKOV CHAIN DYNAMICS
Fort, Gersende
ESAIM-PROBABILITY AND STATISTICS, 2015, 19 : 60 - 80
[6] CENTRAL LIMIT THEOREMS OF A RECURSIVE STOCHASTIC ALGORITHM WITH APPLICATIONS TO ADAPTIVE DESIGNS
Zhang, Li-Xin
ANNALS OF APPLIED PROBABILITY, 2016, 26 (06) : 3630 - 3658
[7] Stochastic Gradient Descent in Continuous Time
Sirignano, Justin
Spiliopoulos, Konstantinos
SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2017, 8 (01): : 933 - 961
[8] Central Limit Theorems for Stochastic Optimization Algorithms Using Infinitesimal Perturbation Analysis
Qian-Yu Tang
Han-Fu Chen
Discrete Event Dynamic Systems, 2000, 10 : 5 - 32
[9] On the Hyperparameters in Stochastic Gradient Descent with Momentum
Shi, Bin
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[10] Central limit theorems for stochastic optimization algorithms using infinitesimal perturbation analysis
Tang, QY
L'Ecuyer, P
Chen, HF
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2000, 10 (1-2): : 5 - 32

← 1 2 3 4 5 →