On Biased Stochastic Gradient Estimation

被引:0
作者
Driggs, Derek [1 ]
Liang, Jingwei [2 ,3 ]
Schonlieb, Carola-Bibiane [1 ]
机构
[1] Univ Cambridge, Dept Appl Math & Theoret Phys, Cambridge CB3 0WA, England
[2] Shanghai Jiao Tong Univ, Inst Nat Sci, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, Sch Math Sci, Shanghai 200240, Peoples R China
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
stochastic gradient descent; variance reduction; biased gradient estimation; OPTIMIZATION; ALGORITHM;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a uniform analysis of biased stochastic gradient methods for minimizing convex, strongly convex, and non-convex composite objectives, and identify settings where bias is useful in stochastic gradient estimation. The framework we present allows us to extend proximal support to biased algorithms, including SAG and SARAH, for the first time in the convex setting. We also use our framework to develop a new algorithm, Stochastic Average Recursive GradiEnt (SARGE), that achieves the oracle complexity lower-bound for nonconvex, finite-sum objectives and requires strictly fewer calls to a stochastic gradient oracle per iteration than SVRG and SARAH. We support our theoretical results with numerical experiments that demonstrate the benefits of certain biased gradient estimators.
引用
收藏
页数:43
相关论文
共 50 条
  • [31] Adaptive Stochastic Gradient Descent (SGD) for erratic datasets
    Dagal, Idriss
    Tanrioven, Kursat
    Nayir, Ahmet
    Akin, Burak
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2025, 166
  • [32] Nested Distributed Gradient Methods with Stochastic Computation Errors
    Iakovidou, Charikleia
    Wei, Ermin
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 339 - 346
  • [33] Stochastic Gradient Descent with Polyak’s Learning Rate
    Mariana Prazeres
    Adam M. Oberman
    Journal of Scientific Computing, 2021, 89
  • [34] Distributed and asynchronous Stochastic Gradient Descent with variance reduction
    Ming, Yuewei
    Zhao, Yawei
    Wu, Chengkun
    Li, Kuan
    Yin, Jianping
    NEUROCOMPUTING, 2018, 281 : 27 - 36
  • [35] On Almost Sure Convergence Rates of Stochastic Gradient Methods
    Liu, Jun
    Yuan, Ye
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [36] Adjusted stochastic gradient descent for latent factor analysis
    Li, Qing
    Xiong, Diwen
    Shang, Mingsheng
    INFORMATION SCIENCES, 2022, 588 : 196 - 213
  • [37] Stochastic Gradient Descent with Polyak's Learning Rate
    Prazeres, Mariana
    Oberman, Adam M.
    JOURNAL OF SCIENTIFIC COMPUTING, 2021, 89 (01)
  • [38] Stochastic gradient descent for semilinear elliptic equations with uncertainties
    Wang, Ting
    Knap, Jaroslaw
    JOURNAL OF COMPUTATIONAL PHYSICS, 2021, 426
  • [39] Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
    Allen-Zhu, Zeyuan
    STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, : 1200 - 1205
  • [40] SAAGs: Biased stochastic variance reduction methods for large-scale learning
    Vinod Kumar Chauhan
    Anuj Sharma
    Kalpana Dahiya
    Applied Intelligence, 2019, 49 : 3331 - 3361