An Efficient Method for Training Deep Learning Networks Distributed

被引：0

作者：

Wang, Chenxu ^{[1
]}

Lu, Yutong ^{[2
,3
]}

Chen, Zhiguang ^{[2
,3
]}

Li, Junnan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Peoples R China

[2] Natl Supercomp Ctr Guangzhou, Guangzhou 510006, Peoples R China

[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2020年 / E103D卷 / 12期

关键词：

deep learning; distributed training; hierarchical synchronous stochastic gradient descent; data-parallelism;

D O I：

10.1587/transinf.2020PAP0007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.

引用

页码：2444 / 2456

页数：13

共 35 条

[1] [Anonymous], 2015, CORR
[2] Awan AA, 2017, ACM SIGPLAN NOTICES, V52, P193, DOI [10.1145/3018743.3018769, 10.1145/3155284.3018769]
[3] Chen Chia-Yu., 2017, CoRR
[4] Chen J., 2016, CORR
[5] moDNN: Memory Optimal Deep Neural Network Training on Graphics Processing Units
Chen, Xiaoming
Chen, Danny Ziyi
Han, Yinhe
Hu, Xiaobo Sharon
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 646 - 661
[6] Chen ZL, 2017, IEEE INT VEH SYM, P1856, DOI 10.1109/IVS.2017.7995975
[7] Chilimbi T., 2014, 11 USENIX S OP SYST, P571, DOI DOI 10.1108/01439911111122716
[8] Das D., 2016, Distributed deep learning using synchronous stochastic gradient descent
[9] Dean Jeffrey, 2012, Advances in neural information processing systems (NeurIPS 2012), V25
[10] swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight
Fang, Jiarui
Fu, Haohuan
Zhao, Wenlai
Chen, Bingwei
Zheng, Weijie
Yang, Guangwen
[J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 615 - 624

← 1 2 3 4 →