Online stochastic gradient descent on non-convex losses from high-dimensional inference

被引：0

作者：

Ben Arous, Gerard ^{[1
]}

Gheissari, Reza ^{[2
,3
]}

Jagannath, Aukosh ^{[4
,5
]}

机构：

[1] NYU, Courant Inst Math Sci, New York, NY 10012 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA

[4] Univ Waterloo, Dept Stat, Waterloo, ON, Canada

[5] Univ Waterloo, Dept Actuarial Sci & Appl Math, Waterloo, ON, Canada

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2021年 / 22卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

stochastic gradient descent; parameter estimation; non-convex optimization; supervised learning; generalized linear models; tensor PCA; EMPIRICAL RISK; THRESHOLDS; RECOVERY; INITIALIZATION; LANDSCAPE;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.

引用

页数：51

共 50 条

[1] Online stochastic gradient descent on non-convex losses from high-dimensional inference
Arous, Gerard Ben
Gheissari, Reza
Jagannath, Aukosh
Journal of Machine Learning Research, 2021, 22
[2] High-dimensional non-convex landscapes and gradient descent dynamics
Bonnaire, Tony
Ghio, Davide
Krishnamurthy, Kamesh
Mignacco, Francesca
Yamamura, Atsushi
Biroli, Giulio
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):
[3] Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent
Jin, Chi
Kakade, Sham M.
Netrapalli, Praneeth
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[4] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
Chen, Ruijuan
Tang, Xiaoquan
Li, Xiuting
FRACTAL AND FRACTIONAL, 2022, 6 (12)
[5] Scaling up stochastic gradient descent for non-convex optimisation
Mohamad, Saad
Alamri, Hamad
Bouchachia, Abdelhamid
MACHINE LEARNING, 2022, 111 (11) : 4039 - 4079
[6] Scaling up stochastic gradient descent for non-convex optimisation
Saad Mohamad
Hamad Alamri
Abdelhamid Bouchachia
Machine Learning, 2022, 111 : 4039 - 4079
[7] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
Xu, Yi
Yuan, Zhuoning
Yang, Sen
Jin, Rong
Yang, Tianbao
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
[8] On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
Mertikopoulos, Panayotis
Hallak, Nadav
Kavis, Ali
Cevher, Volkan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[9] Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
Tan, Yan Shuo
Vershynin, Roman
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[10] Evolutionary Gradient Descent for Non-convex Optimization
Xue, Ke
Qian, Chao
Xu, Ling
Fei, Xudong
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227

← 1 2 3 4 5 →