Harnessing Smoothness to Accelerate Distributed Optimization

被引：487

作者：

Qu, Guannan ^{[1
]}

Li, Na ^{[1
]}

机构：

[1] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS | 2018年 / 5卷 / 03期

关键词：

Distributed algorithms; multiagent systems; optimization methods; SUBGRADIENT METHODS; CONSENSUS;

D O I：

10.1109/TCNS.2017.2698261

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There has been a growing effort in studying the distributed optimization problem over a network. The objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. The literature has developed consensus-based distributed (sub) gradient descent (DGD) methods and has shown that they have the same convergence rate O(log t/root t) as the centralized (sub) gradient methods (CGD), when the function is convex but possibly nonsmooth. However, when the function is convex and smooth, under the framework of DGD, it is unclear how to harness the smoothness to obtain a faster convergence rate comparable to CGD's convergence rate. In this paper, we propose a distributed algorithm that, despite using the same amount of communication per iteration as DGD, can effectively harnesses the function smoothness and converge to the optimum with a rate of O(1/t). If the objective function is further strongly convex, our algorithm has a linear convergence rate. Both rates match the convergence rate of CGD. The key step in our algorithm is a novel gradient estimation scheme that uses history information to achieve fast and accurate estimation of the average gradient. To motivate the necessity of history information, we also show that it is impossible for a class of distributed algorithms like DGDto achieve a linear convergence rate without using history information even if the objective function is strongly convex and smooth.

引用

页码：1245 / 1260

页数：16

共 46 条

[1]

[Anonymous], ARXIV160704757

[2]

[Anonymous], 2013, ARXIV13107063

[3]

[Anonymous], ARXIV160703218

[4]

[Anonymous], 2014, ARXIV14114186

[5]

[Anonymous], 2012, THESIS

[6]

Bajovic D., 2015, ARXIV150901703

[7] Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity [J].

Bazerque, Juan Andres ;

Giannakis, Georgios B. .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (03) :1847-1862

[8]

Bertsekas D.P., 1997, Parallel and distributed computation: numerical methods

[9]

Bertsekas Dimitri P, 1999, NONLINEAR PROGRAMMIN, V2

[10] Distributed optimization and statistical learning via the alternating direction method of multipliers [J].

Boyd S. ;

Parikh N. ;

Chu E. ;

Peleato B. ;

Eckstein J. .

Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122

← 1 2 3 4 5 →