Improving the Transient Times for Distributed Stochastic Gradient Methods

被引:8
作者
Huang, Kun [1 ,2 ]
Pu, Shi [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Chinese Univ Hong Kong, Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen 518172, Peoples R China
基金
中国国家自然科学基金;
关键词
Distributed optimization; stochastic gradi-ent methods; convex optimization; LEARNING-BEHAVIOR; OPTIMIZATION; CONVERGENCE; ALGORITHMS; DIFFUSION; CONSENSUS;
D O I
10.1109/TAC.2022.3201141
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the distributed optimization problem where n agents, each possessing a local cost function, collaboratively minimize the average of the n cost functions over a connected network. Assuming stochastic gradient information is available, we study a distributed stochastic gradient algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from the Exact Diffusion method (Yuan et al., 2019) and NIDS (Li et al., 2019) and perform a nonasymptotic convergence analysis. We not only show that EDAS asymptotically achieves the same network independent convergence rate as centralized stochastic gradient descent for minimizing strongly convex and smooth objective functions, but also characterize the transient time needed for the algorithm to approach the asymptotic convergence rate, which behaves as K-T = O ( n/1-?(2) ), where 1 - ?(2) stands for the spectral gap of the mixing matrix. To the best of our knowledge, EDAS achieves the shortest transient time when the average of the n cost functions is strongly convex and each cost function is smooth. Numerical simulations further corroborate and strengthen the obtained theoretical results.
引用
收藏
页码:4127 / 4142
页数:16
相关论文
共 52 条
[1]   Distributed Coupled Multiagent Stochastic Optimization [J].
Alghunaim, Sulaiman A. ;
Sayed, Ali H. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (01) :175-190
[2]  
Assran Mahmoud, 2019, INT C MACHINE LEARNI, P344
[3]   Proximal-Gradient Algorithms for Tracking Cascades Over Social Networks [J].
Baingana, Brian ;
Mateos, Gonzalo ;
Giannakis, Georgios B. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (04) :563-575
[4]   Federated learning of predictive models from federated Electronic Health Records [J].
Brisimi, Theodora S. ;
Chen, Ruidi ;
Mela, Theofanie ;
Olshevsky, Alex ;
Paschalidis, Ioannis Ch. ;
Shi, Wei .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 112 :59-67
[5]   On the Learning Behavior of Adaptive Networks-Part II: Performance Analysis [J].
Chen, Jianshu ;
Sayed, Ali H. .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (06) :3518-3548
[6]   On the Learning Behavior of Adaptive Networks-Part I: Transient Analysis [J].
Chen, Jianshu ;
Sayed, Ali H. .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (06) :3487-3517
[7]  
Chen JS, 2012, ANN ALLERTON CONF, P1535, DOI 10.1109/Allerton.2012.6483402
[8]   Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks [J].
Chen, Jianshu ;
Sayed, Ali H. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (08) :4289-4305
[9]   On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression [J].
Cohen, Kobi ;
Nedic, Angelia ;
Srikant, R. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (11) :5974-5981
[10]   Distributed Learning Algorithms for Spectrum Sharing in Spatial Random Access Wireless Networks [J].
Cohen, Kobi ;
Nedic, Angelia ;
Srikant, R. .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) :2854-2869