Harnessing Smoothness to Accelerate Distributed Optimization

被引:442
作者
Qu, Guannan [1 ]
Li, Na [1 ]
机构
[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
来源
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS | 2018年 / 5卷 / 03期
关键词
Distributed algorithms; multiagent systems; optimization methods; SUBGRADIENT METHODS; CONSENSUS;
D O I
10.1109/TCNS.2017.2698261
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. The literature has developed consensus-based distributed (sub) gradient descent (DGD) methods and has shown that they have the same convergence rate O(log t/root t) as the centralized (sub) gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGDto achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.
引用
收藏
页码:1245 / 1260
页数:16
相关论文
共 46 条
  • [1] [Anonymous], ARXIV160704757
  • [2] [Anonymous], 2013, ARXIV13107063
  • [3] [Anonymous], ARXIV160703218
  • [4] [Anonymous], 2014, ARXIV14114186
  • [5] [Anonymous], 2012, THESIS
  • [6] Bajovic D., 2015, ARXIV150901703
  • [7] Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity
    Bazerque, Juan Andres
    Giannakis, Georgios B.
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (03) : 1847 - 1862
  • [8] Bertsekas D.P., 1997, Parallel and distributed computation: numerical methods
  • [9] Bertsekas Dimitri P, 1999, NONLINEAR PROGRAMMIN, V2
  • [10] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122