A Stochastic Second-Order Proximal Method for Distributed Optimization

被引：3

作者：

Qiu, Chenyang ^{[1
]}

Zhu, Shanying ^{[2
]}

Ou, Zichong ^{[1
]}

Lu, Jie ^{[3
,4
]}

机构：

[1] Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Automation, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China

[3] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China

[4] ShanghaiTech Univ, Shanghai Engn Res Ctr Energy Efficient & Custom AI, Shanghai 201210, Peoples R China

来源：

IEEE CONTROL SYSTEMS LETTERS | 2023年 / 7卷

基金：

中国国家自然科学基金;

关键词：

Convergence; Optimization; Approximation algorithms; Lagrangian functions; Upper bound; Stochastic processes; Taylor series; Distributed optimization; second-order method; stochastic optimization; ALGORITHM;

D O I：

10.1109/LCSYS.2023.3244740

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a distributed stochastic second-order proximal (St-SoPro) method that enables agents in a network to cooperatively minimize the sum of their local loss functions without any centralized coordination. St-SoPro incorporates a decentralized second-order approximation into an augmented Lagrangian function, and randomly samples the local gradients and Hessian matrices to update, so that it is efficient in solving large-scale problems. We show that for restricted strongly convex and smooth problems, the agents linearly converge in expectation to a neighborhood of the optimum, and the neighborhood can be arbitrarily small under proper parameter settings. Simulations over real machine learning datasets demonstrate that St-SoPro outperforms several state-of-the-art methods in terms of convergence speed as well as computation and communication costs.

引用

页码：1405 / 1410

页数：6

共 19 条

[1]

Alistarh D, 2017, ADV NEUR IN, V30

[2]

[Anonymous], IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method

[3]

Bach F, 2014, J MACH LEARN RES, V15, P595

[4] Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity [J].

Bazerque, Juan Andres ;

Giannakis, Georgios B. .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (03) :1847-1862

[5] Optimization Methods for Large-Scale Machine Learning [J].

Bottou, Leon ;

Curtis, Frank E. ;

Nocedal, Jorge .

SIAM REVIEW, 2018, 60 (02) :223-311

[6] Large-Scale Machine Learning with Stochastic Gradient Descent [J].

Bottou, Leon .

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186

[7] Distributed optimization and statistical learning via the alternating direction method of multipliers [J].

Boyd S. ;

Parikh N. ;

Chu E. ;

Peleato B. ;

Eckstein J. .

Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122

[8] Accelerated gradient methods and dual decomposition in distributed model predictive control [J].

Giselsson, Pontus ;

Minh Dang Doan ;

Keviczky, Tamas ;

De Schutter, Bart ;

Rantzer, Anders .

AUTOMATICA, 2013, 49 (03) :829-833

[9] Improving the Transient Times for Distributed Stochastic Gradient Methods [J].

Huang, Kun ;

Pu, Shi .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) :4127-4142

[10]

Lohr S, 1999, SAMPLING DESIGN ANAL

← 1 2 →