Online stochastic gradient descent on non-convex losses from high-dimensional inference

被引:0
|
作者
Ben Arous, Gerard [1 ]
Gheissari, Reza [2 ,3 ]
Jagannath, Aukosh [4 ,5 ]
机构
[1] NYU, Courant Inst Math Sci, New York, NY 10012 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept EECS, Berkeley, CA 94720 USA
[4] Univ Waterloo, Dept Stat, Waterloo, ON, Canada
[5] Univ Waterloo, Dept Actuarial Sci & Appl Math, Waterloo, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
stochastic gradient descent; parameter estimation; non-convex optimization; supervised learning; generalized linear models; tensor PCA; EMPIRICAL RISK; THRESHOLDS; RECOVERY; INITIALIZATION; LANDSCAPE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.
引用
收藏
页数:51
相关论文
共 50 条
  • [31] Phase transitions of spectral initialization for high-dimensional non-convex estimation
    Lu, Yue M.
    Li, Gen
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2020, 9 (03) : 507 - 541
  • [32] Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness
    Scaman, Kevin
    Malherbe, Cedric
    Dos Santos, Ludovic
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19310 - 19327
  • [33] Convergence Rates of Non-Convex Stochastic Gradient Descent Under a Generic Lojasiewicz Condition and Local Smoothness
    Scaman, Kevin
    Malherbe, Cédric
    Dos Santos, Ludovic
    Proceedings of Machine Learning Research, 2022, 162 : 19310 - 19327
  • [34] A Non-Euclidean Gradient Descent Framework for Non-Convex Matrix Factorization
    Hsieh, Ya-Ping
    Kao, Yu-Chun
    Mahabadi, Rabeeh Karimi
    Yurtsever, Alp
    Kyrillidis, Anastasios
    Cevher, Volkan
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (22) : 5917 - 5926
  • [35] Online Learning with Non-Convex Losses and Non-Stationary Regret
    Gao, Xiang
    Li, Xiaobo
    Zhang, Shuzhong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [36] Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot
    Jain, Prateek
    Jin, Chi
    Kakade, Sham M.
    Netrapalli, Praneeth
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 479 - 488
  • [37] Gradient descent with non-convex constraints: local concavity determines convergence
    Barber, Rina Foygel
    Ha, Wooseok
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2018, 7 (04) : 755 - 806
  • [38] Online Bandit Learning for a Special Class of Non-convex Losses
    Zhang, Lijun
    Yang, Tianbao
    Jin, Rong
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3158 - 3164
  • [39] ACTIVE SET STRATEGY FOR HIGH-DIMENSIONAL NON-CONVEX SPARSE OPTIMIZATION PROBLEMS
    Boisbunon, Aurelie
    Flamary, Remi
    Rakotomamonjy, Alain
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Robust group non-convex estimations for high-dimensional partially linear models
    Wang, Mingqiu
    Tian, Guo-Liang
    JOURNAL OF NONPARAMETRIC STATISTICS, 2016, 28 (01) : 49 - 67