A Stochastic Second-Order Proximal Method for Distributed Optimization

被引:3
作者
Qiu, Chenyang [1 ]
Zhu, Shanying [2 ]
Ou, Zichong [1 ]
Lu, Jie [3 ,4 ]
机构
[1] Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Automation, Key Lab Syst Control & Informat Proc, Shanghai 200240, Peoples R China
[3] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
[4] ShanghaiTech Univ, Shanghai Engn Res Ctr Energy Efficient & Custom AI, Shanghai 201210, Peoples R China
来源
IEEE CONTROL SYSTEMS LETTERS | 2023年 / 7卷
基金
中国国家自然科学基金;
关键词
Convergence; Optimization; Approximation algorithms; Lagrangian functions; Upper bound; Stochastic processes; Taylor series; Distributed optimization; second-order method; stochastic optimization; ALGORITHM;
D O I
10.1109/LCSYS.2023.3244740
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a distributed stochastic second-order proximal (St-SoPro) method that enables agents in a network to cooperatively minimize the sum of their local loss functions without any centralized coordination. St-SoPro incorporates a decentralized second-order approximation into an augmented Lagrangian function, and randomly samples the local gradients and Hessian matrices to update, so that it is efficient in solving large-scale problems. We show that for restricted strongly convex and smooth problems, the agents linearly converge in expectation to a neighborhood of the optimum, and the neighborhood can be arbitrarily small under proper parameter settings. Simulations over real machine learning datasets demonstrate that St-SoPro outperforms several state-of-the-art methods in terms of convergence speed as well as computation and communication costs.
引用
收藏
页码:1405 / 1410
页数:6
相关论文
共 19 条
[1]  
Alistarh D, 2017, ADV NEUR IN, V30
[2]  
[Anonymous], IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method
[3]  
Bach F, 2014, J MACH LEARN RES, V15, P595
[4]   Distributed Spectrum Sensing for Cognitive Radio Networks by Exploiting Sparsity [J].
Bazerque, Juan Andres ;
Giannakis, Georgios B. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2010, 58 (03) :1847-1862
[5]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[6]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[7]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[8]   Accelerated gradient methods and dual decomposition in distributed model predictive control [J].
Giselsson, Pontus ;
Minh Dang Doan ;
Keviczky, Tamas ;
De Schutter, Bart ;
Rantzer, Anders .
AUTOMATICA, 2013, 49 (03) :829-833
[9]   Improving the Transient Times for Distributed Stochastic Gradient Methods [J].
Huang, Kun ;
Pu, Shi .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (07) :4127-4142
[10]  
Lohr S, 1999, SAMPLING DESIGN ANAL