DAdam: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization

被引:18
作者
Nazari, Parvin [1 ]
Tarzanagh, Davoud Ataee [2 ]
Michailidis, George [3 ]
机构
[1] Amirkabir Univ Technol, Dept Math & Comp Sci, Tehran Polytech, Rasht St, Tehran 1591634311, Iran
[2] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA
[3] Univ Florida, Dept Stat, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
Heuristic algorithms; Minimization; Convergence; Convex functions; Estimation; Costs; Stochastic processes; Decentralized optimization; online learning; regret bound; adaptive methods; SUBGRADIENT METHODS; CONVERGENCE; ALGORITHMS;
D O I
10.1109/TSP.2022.3223214
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Adaptive optimization methods, such as AdaGrad, RMSProp, and Adam, are widely used in solving large-scale machine learning problems. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications between peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (DAdam) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access only to local data is permitted. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in machine learning applications. We analyze the convergence properties of the proposed algorithm and provide a regret bound on the convergence rate of adaptive moment estimation methods in both online convex and non-convex settings. Empirical results demonstrate that DAdam exhibits also good performance in practice and compares favorably to competing online optimization methods.
引用
收藏
页码:6065 / 6079
页数:15
相关论文
共 60 条
[1]  
Abernethy J., 2008, P 21 ANN C LEARNING, P415
[2]  
Aydore Sergul., 2019, Advances in Neural Information Processing Systems, P7982
[3]   Non-Stationary Stochastic Optimization [J].
Besbes, Omar ;
Gur, Yonatan ;
Zeevi, Assaf .
OPERATIONS RESEARCH, 2015, 63 (05) :1227-1244
[4]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[5]  
Boyd S, 2004, SIAM REV, V46, P667, DOI [10.1137/S0036144503423264, 10.1137/s0036144503423264]
[6]   Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method [J].
Chang, Tsung-Hui ;
Nedic, Angelia ;
Scaglione, Anna .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2014, 59 (06) :1524-1538
[7]  
Chen X., 2019, P INT C LEARN REPR, P1
[8]  
Cheng CA, 2020, PR MACH LEARN RES, V108, P2218
[9]  
Zeiler MD, 2012, Arxiv, DOI arXiv:1212.5701
[10]  
De S., 2018, PROC ICML WORKSHOP M