An Efficient Method for Training Deep Learning Networks Distributed

被引：0

作者：

Wang, Chenxu ^{[1
]}

Lu, Yutong ^{[2
,3
]}

Chen, Zhiguang ^{[2
,3
]}

Li, Junnan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Peoples R China

[2] Natl Supercomp Ctr Guangzhou, Guangzhou 510006, Peoples R China

[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2020年 / E103D卷 / 12期

关键词：

deep learning; distributed training; hierarchical synchronous stochastic gradient descent; data-parallelism;

D O I：

10.1587/transinf.2020PAP0007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.

引用

页码：2444 / 2456

页数：13

共 35 条

[11] Geist A., 1996, Euro-Par '96 Parallel Processing. Second International Euro-Par Conference. Proceedings, P128
[12] Georganas E, 2018, SC18 INT C HIGH PERF
[13] A Primer on Neural Network Models for Natural Language Processing
Goldberg, Yoav
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 : 345 - 420
[14] Goyal Priya, 2017, ARXIV170602677
[15] A high-performance, portable implementation of the MPI message passing interface standard
Gropp, W
Lusk, E
Doss, N
Skjellum, A
[J]. PARALLEL COMPUTING, 1996, 22 (06) : 789 - 828
[16] Han S., 2016, P INT C LEARN REPR S
[17] He K., 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[18] Jin PeterH., 2016, CoRR
[19] Krizhevsky A., 2009, Learning multiple layers of features from tiny images
[20] Krizhevsky A., 2014, ARXIV14045997, V1404, P5997, DOI DOI 10.48550/ARXIV.1404.5997

← 1 2 3 4 →