Convergence of Distributed Stochastic Variance Reduced Methods Without Sampling Extra Data

被引:13
|
作者
Cen, Shicong [1 ]
Zhang, Huishuai [2 ]
Chi, Yuejie [1 ]
Chen, Wei [2 ]
Liu, Tie-Yan [2 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
基金
美国国家科学基金会;
关键词
Distributed optimization; stochastic optimization; master; slave model; variance reduction;
D O I
10.1109/TSP.2020.3005291
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes imperative to distribute the implementation of such variance reduced methods. In this paper, we consider a general framework that directly distributes popular stochastic variance reduced methods in the master/slave model, by assigning outer loops to the parameter server, and inner loops to worker machines. This framework is natural and friendly to implement, but its theoretical convergence is not well understood. We obtain a comprehensive understanding of algorithmic convergence with respect to data homogeneity by measuring the smoothness of the discrepancy between the local and global loss functions. We establish the linear convergence of distributed versions of a family of stochastic variance reduced algorithms, including those using accelerated and recursive gradient updates, for minimizing strongly convex losses. Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary. Furthermore, we show that when the data are less balanced, regularization can be used to ensure convergence at a slower rate. We also demonstrate that our analysis can be further extended to handle nonconvex loss functions.
引用
收藏
页码:3976 / 3989
页数:14
相关论文
共 15 条
  • [1] Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement
    Lee, Jason D.
    Lin, Qihang
    Ma, Tengyu
    Yang, Tianbao
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [2] Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence
    Xin, Ran
    Khan, Usman A.
    Kar, Soummya
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 6255 - 6271
  • [3] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [4] Stochastic Variance-Reduced Cubic Regularization Methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [5] Accelerating variance-reduced stochastic gradient methods
    Derek Driggs
    Matthias J. Ehrhardt
    Carola-Bibiane Schönlieb
    Mathematical Programming, 2022, 191 : 671 - 715
  • [6] Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning
    Zhang, Jiaojiao
    Liu, Huikang
    So, Anthony Man-Cho
    Ling, Qing
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 311 - 326
  • [7] Convergence of Asynchronous Distributed Gradient Methods Over Stochastic Networks
    Xu, Jinming
    Zhu, Shanying
    Soh, Yeng Chai
    Xie, Lihua
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (02) : 434 - 448
  • [8] STOCHASTIC FIRST-ORDER METHODS OVER DISTRIBUTED DATA
    Qureshi, Muhammad I.
    Khan, Usman A.
    2022 IEEE 12TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2022, : 405 - 409
  • [9] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Morin, Martin
    Giselsson, Pontus
    NUMERICAL ALGORITHMS, 2022, 91 (02) : 749 - 772
  • [10] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Martin Morin
    Pontus Giselsson
    Numerical Algorithms, 2022, 91 : 749 - 772