Convergence of Distributed Stochastic Variance Reduced Methods Without Sampling Extra Data

被引：13

作者：

Cen, Shicong ^{[1
]}

Zhang, Huishuai ^{[2
]}

Chi, Yuejie ^{[1
]}

Chen, Wei ^{[2
]}

Liu, Tie-Yan ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA

[2] Microsoft Res Asia, Beijing 100080, Peoples R China

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2020年 / 68卷

基金：

美国国家科学基金会;

关键词：

Distributed optimization; stochastic optimization; master; slave model; variance reduction;

D O I：

10.1109/TSP.2020.3005291

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Stochastic variance reduced methods have gained a lot of interest recently for empirical risk minimization due to its appealing run time complexity. When the data size is large and disjointly stored on different machines, it becomes imperative to distribute the implementation of such variance reduced methods. In this paper, we consider a general framework that directly distributes popular stochastic variance reduced methods in the master/slave model, by assigning outer loops to the parameter server, and inner loops to worker machines. This framework is natural and friendly to implement, but its theoretical convergence is not well understood. We obtain a comprehensive understanding of algorithmic convergence with respect to data homogeneity by measuring the smoothness of the discrepancy between the local and global loss functions. We establish the linear convergence of distributed versions of a family of stochastic variance reduced algorithms, including those using accelerated and recursive gradient updates, for minimizing strongly convex losses. Our theory captures how the convergence of distributed algorithms behaves as the number of machines and the size of local data vary. Furthermore, we show that when the data are less balanced, regularization can be used to ensure convergence at a slower rate. We also demonstrate that our analysis can be further extended to handle nonconvex loss functions.

引用

页码：3976 / 3989

页数：14

共 15 条

[1] Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement
Lee, Jason D.
Lin, Qihang
Ma, Tengyu
Yang, Tianbao
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
[2] Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence
Xin, Ran
Khan, Usman A.
Kar, Soummya
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 6255 - 6271
[3] Accelerating variance-reduced stochastic gradient methods
Driggs, Derek
Ehrhardt, Matthias J.
Schonlieb, Carola-Bibiane
MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
[4] Stochastic Variance-Reduced Cubic Regularization Methods
Zhou, Dongruo
Xu, Pan
Gu, Quanquan
JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
[5] Accelerating variance-reduced stochastic gradient methods
Derek Driggs
Matthias J. Ehrhardt
Carola-Bibiane Schönlieb
Mathematical Programming, 2022, 191 : 671 - 715
[6] Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning
Zhang, Jiaojiao
Liu, Huikang
So, Anthony Man-Cho
Ling, Qing
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 311 - 326
[7] Convergence of Asynchronous Distributed Gradient Methods Over Stochastic Networks
Xu, Jinming
Zhu, Shanying
Soh, Yeng Chai
Xie, Lihua
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (02) : 434 - 448
[8] STOCHASTIC FIRST-ORDER METHODS OVER DISTRIBUTED DATA
Qureshi, Muhammad I.
Khan, Usman A.
2022 IEEE 12TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2022, : 405 - 409
[9] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
Morin, Martin
Giselsson, Pontus
NUMERICAL ALGORITHMS, 2022, 91 (02) : 749 - 772
[10] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
Martin Morin
Pontus Giselsson
Numerical Algorithms, 2022, 91 : 749 - 772

← 1 2 →