Online stochastic gradient descent on non-convex losses from high-dimensional inference

被引:0
|
作者
Ben Arous, Gerard [1 ]
Gheissari, Reza [2 ,3 ]
Jagannath, Aukosh [4 ,5 ]
机构
[1] NYU, Courant Inst Math Sci, New York, NY 10012 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[4] Univ Waterloo, Dept Stat, Waterloo, ON, Canada
[5] Univ Waterloo, Dept Actuarial Sci & Appl Math, Waterloo, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
stochastic gradient descent; parameter estimation; non-convex optimization; supervised learning; generalized linear models; tensor PCA; EMPIRICAL RISK; THRESHOLDS; RECOVERY; INITIALIZATION; LANDSCAPE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.
引用
收藏
页数:51
相关论文
共 50 条
  • [1] Online stochastic gradient descent on non-convex losses from high-dimensional inference
    Arous, Gerard Ben
    Gheissari, Reza
    Jagannath, Aukosh
    Journal of Machine Learning Research, 2021, 22
  • [2] High-dimensional non-convex landscapes and gradient descent dynamics
    Bonnaire, Tony
    Ghio, Davide
    Krishnamurthy, Kamesh
    Mignacco, Francesca
    Yamamura, Atsushi
    Biroli, Giulio
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):
  • [3] Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent
    Jin, Chi
    Kakade, Sham M.
    Netrapalli, Praneeth
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [4] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
    Chen, Ruijuan
    Tang, Xiaoquan
    Li, Xiuting
    FRACTAL AND FRACTIONAL, 2022, 6 (12)
  • [5] Scaling up stochastic gradient descent for non-convex optimisation
    Mohamad, Saad
    Alamri, Hamad
    Bouchachia, Abdelhamid
    MACHINE LEARNING, 2022, 111 (11) : 4039 - 4079
  • [6] Scaling up stochastic gradient descent for non-convex optimisation
    Saad Mohamad
    Hamad Alamri
    Abdelhamid Bouchachia
    Machine Learning, 2022, 111 : 4039 - 4079
  • [7] On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization
    Xu, Yi
    Yuan, Zhuoning
    Yang, Sen
    Jin, Rong
    Yang, Tianbao
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4003 - 4009
  • [8] On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
    Mertikopoulos, Panayotis
    Hallak, Nadav
    Kavis, Ali
    Cevher, Volkan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [9] Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
    Tan, Yan Shuo
    Vershynin, Roman
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [10] Evolutionary Gradient Descent for Non-convex Optimization
    Xue, Ke
    Qian, Chao
    Xu, Ling
    Fei, Xudong
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227