Online stochastic gradient descent on non-convex losses from high-dimensional inference

被引：0

作者：

Ben Arous, Gerard ^{[1
]}

Gheissari, Reza ^{[2
,3
]}

Jagannath, Aukosh ^{[4
,5
]}

机构：

[1] NYU, Courant Inst Math Sci, New York, NY 10012 USA

[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA

[4] Univ Waterloo, Dept Stat, Waterloo, ON, Canada

[5] Univ Waterloo, Dept Actuarial Sci & Appl Math, Waterloo, ON, Canada

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2021年 / 22卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

stochastic gradient descent; parameter estimation; non-convex optimization; supervised learning; generalized linear models; tensor PCA; EMPIRICAL RISK; THRESHOLDS; RECOVERY; INITIALIZATION; LANDSCAPE;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.

引用

页数：51

共 50 条

[21] On Byzantine-Resilient High-Dimensional Stochastic Gradient Descent
Data, Deepesh
Diggavi, Suhas
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2628 - 2633
[22] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
Pascal Bianchi
Walid Hachem
Sholom Schechtman
Set-Valued and Variational Analysis, 2022, 30 : 1117 - 1147
[23] Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
Bianchi, Pascal
Hachem, Walid
Schechtman, Sholom
SET-VALUED AND VARIATIONAL ANALYSIS, 2022, 30 (03) : 1117 - 1147
[24] Online Optimization with Predictions and Non-convex Losses
Lin, Yiheng
Goel, Gautam
Wierman, Adam
PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2020, 4 (01)
[25] A New Non-Convex Framework to Improve Asymptotical Knowledge on Generic Stochastic Gradient Descent
Fest, Jean-Baptiste
Repetti, Audrey
Chouzenoux, Emilie
IEEE International Workshop on Machine Learning for Signal Processing, MLSP, 2023, 2023-September
[26] A new non-convex framework to improve asymptotical knowledge on generic stochastic gradient descent
Fest, Jean-Baptiste
Repetti, Audrey
Chouzenoux, Émilie
arXiv, 2023,
[27] Taming Convergence for Asynchronous Stochastic Gradient Descent with Unbounded Delay in Non-Convex Learning
Zhang, Xin
Liu, Jia
Zhu, Zhengyuan
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 3580 - 3585
[28] Online Optimization with Predictions and Non-convex Losses
Lin, Yiheng
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[29] Projected Gradient Descent for Non-Convex Sparse Spike Estimation
Traonmilin, Yann
Aujol, Jean-Francois
Leclaire, Arthur
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 1110 - 1114
[30] Generalization Bound of Gradient Descent for Non-Convex Metric Learning
Dong, Mingzhi
Yang, Xiaochen
Zhu, Rui
Wang, Yujiang
Xue, Jing-Hao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →