Central limit theorems for stochastic gradient descent with averaging for stable manifolds*

被引：2

作者：

Dereich, Steffen ^{[1
]}

Kassing, Sebastian ^{[2
]}

机构：

[1] Univ Munster, Inst Math Stochast, Fac Math & Comp Sci, Munster, Germany

[2] Univ Bielefeld, Fac Math, Bielefeld, Germany

来源：

ELECTRONIC JOURNAL OF PROBABILITY | 2023年 / 28卷

关键词：

stochastic approximation; Robbins-Monro; Ruppert-Polyak average; deep learning; stable manifold; APPROXIMATION;

D O I：

10.1214/23-EJP947

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In this article, we establish new central limit theorems for Ruppert-Polyak averaged stochastic gradient descent schemes. Compared to previous work we do not assume that convergence occurs to an isolated attractor but instead allow convergence to a stable manifold. On the stable manifold the target function is constant and the oscillations of the iterates in the tangential direction may be significantly larger than the ones in the normal direction. We still recover a central limit theorem for the averaged scheme in the normal direction with the same rates as in the case of isolated attractors. In the setting where the magnitude of the random perturbation is of constant order, our research covers step-sizes -yn = C gamma n-gamma with C gamma > 0 and -y is an element of (34, 1). In particular, we show that the beneficial effect of averaging prevails in more general situations.

引用

页数：48

共 50 条

[41] Nonconvex Stochastic Scaled Gradient Descent and Generalized Eigenvector Problems
Li, Chris Junchi
Jordan, Michael I.
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1230 - 1240
[42] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
Guo, Pengzhan
Ye, Zeyang
Xiao, Keli
Zhu, Wei
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050
[43] Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling
Babichev, Dmitry
Bach, Francis
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 219 - 228
[44] Accelerated stochastic gradient descent with step size selection rules
Yang, Zhuang
Wang, Cheng
Zhang, Zhemin
Li, Jonathan
SIGNAL PROCESSING, 2019, 159 : 171 - 186
[45] Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm
Needell, Deanna
Srebro, Nathan
Ward, Rachel
MATHEMATICAL PROGRAMMING, 2016, 155 (1-2) : 549 - 573
[46] Stochastic Gradient Descent with Preconditioned Polyak Step-Size
Abdukhakimov, F.
Xiang, C.
Kamzolov, D.
Takac, M.
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2024, 64 (04) : 621 - 634
[47] A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics
Fioresi, Rita
Chaudhari, Pratik
Soatto, Stefano
ENTROPY, 2020, 22 (01) : 101
[48] Stochastic gradient descent with random label noises: doubly stochastic models and inference stabilizer
Xiong, Haoyi
Li, Xuhong
Yu, Boyang
Wu, Dongrui
Zhu, Zhanxing
Dou, Dejing
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
[49] On rate of convergence in non-central limit theorems
Anh, Vo
Leonenko, Nikolai
Olenko, Andriy
Vaskovych, Volodymyr
BERNOULLI, 2019, 25 (4A) : 2920 - 2948
[50] A fully stochastic approach to limit theorems for iterates of Bernstein operators
Konstantopoulos, Takis
Yuan, Linglong
Zazanis, Michael A.
EXPOSITIONES MATHEMATICAE, 2018, 36 (02) : 143 - 165

← 1 2 3 4 5 →