Delay-Adaptive Distributed Stochastic Optimization

被引:0
作者
Ren, Zhaolin [1 ]
Zhou, Zhengyuan [2 ,4 ]
Qiu, Linhai [3 ]
Deshpande, Ajay [4 ]
Kalagnanam, Jayant [4 ]
机构
[1] Harvard Univ, Cambridge, MA 02138 USA
[2] NYU, New York, NY 10003 USA
[3] Google Inc, Menlo Pk, CA USA
[4] IBM Res, Yorktown Hts, NY USA
来源
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷
关键词
CONVERGENCE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In large-scale optimization problems, distributed asynchronous stochastic gradient descent (DASGD) is a commonly used algorithm. In most applications, there are often a large number of computing nodes asynchronously computing gradient information. As such, the gradient information received at a given iteration is often stale. In the presence of such delays, which can be unbounded, the convergence of DASGD is uncertain. The contribution of this paper is twofold. First, we propose a delay-adaptive variant of DASGD where we adjust each iteration's step-size based on the size of the delay, and prove asymptotic convergence of the algorithm on variationally coherent stochastic problems, a class of functions which properly includes convex, quasi-convex and star-convex functions. Second, we extend the convergence results of standard DASGD, used usually for problems with bounded domains, to problems with unbounded domains. In this way, we extend the frontier of theoretical guarantees for distributed asynchronous optimization, and provide new insights for practitioners working on large-scale optimization problems.
引用
收藏
页码:5503 / 5510
页数:8
相关论文
共 26 条
[1]  
Agarwal A., 2011, ADV PROCESSING SYSTE, P873
[2]  
[Anonymous], 2019, ARXIV190200340
[3]  
[Anonymous], 2015, ARXIV150805003
[4]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[5]  
[Anonymous], 1998, NEURAL NETWORKS
[6]  
[Anonymous], 2015, ADV NEURAL INFORM PR
[7]  
[Anonymous], 2017, P C NEUR INF PROC SY
[8]  
Assran M., 2018, STOCHASTIC GRADIENT
[9]  
Benaïm M, 1999, LECT NOTES MATH, V1709, P1
[10]  
Bertsekas D. P., 2003, PARALLEL DISTRIBUTED