Distributed SGD With Flexible Gradient Compression

被引:15
作者
Iran Thi Phuong [1 ]
Le Trieu Phong [2 ]
机构
[1] Ton Due Thang Univ, Fac Math & Stat, Ho Chi Minh City, Vietnam
[2] Natl Inst Informat & Commun Technol NICT, Koganei, Tokyo 1848795, Japan
关键词
Stochastic optimizer; distributed SGD; communication efficiency; deep neural networks;
D O I
10.1109/ACCESS.2020.2984633
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We design and evaluate a new algorithm called FlexCompressSGD for training deep neural networks over distributed datasets via multiple workers and a central server. In FlexCompressSGD, all gradients transmitted between workers and the server are compressed, and the workers are allowed to flexibly choose a compressing method different from that of the server. This flexibility significantly helps reduce the communication cost from each worker to the server. We mathematically prove that FlexCompressSGD converges with convergence rate 1/root MT where M is the number of distributed workers and T is the number of training iterations. We experimentally demonstrate that FlexCompressSGD obtains competitive top-1 testing accuracy on the ImageNet dataset while being able to reduce more than 70% communication cost from each worker to the server when compared with the state-of-the-art.
引用
收藏
页码:64707 / 64717
页数:11
相关论文
共 17 条
[1]  
[Anonymous], 2019, P INT C MACH LEARN
[2]  
[Anonymous], 2012, Advances in neural information processing systems
[3]  
[Anonymous], SOURCE CODE DIST EF
[4]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[5]  
[Anonymous], 2019, P 7 INT C LEARN REPR
[6]  
Basu D, 2019, ADV NEUR IN, V32
[7]  
Chenoweth JM, 2016, FLA MUS NAT HIST-RIP, P1
[8]   The Tail at Scale [J].
Dean, Jeffrey ;
Barroso, Luiz Andre .
COMMUNICATIONS OF THE ACM, 2013, 56 (02) :74-80
[9]  
Dutta S, 2018, PR MACH LEARN RES, V84
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778