An Efficient Method for Training Deep Learning Networks Distributed

被引:0
作者
Wang, Chenxu [1 ]
Lu, Yutong [2 ,3 ]
Chen, Zhiguang [2 ,3 ]
Li, Junnan [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Peoples R China
[2] Natl Supercomp Ctr Guangzhou, Guangzhou 510006, Peoples R China
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Peoples R China
关键词
deep learning; distributed training; hierarchical synchronous stochastic gradient descent; data-parallelism;
D O I
10.1587/transinf.2020PAP0007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.
引用
收藏
页码:2444 / 2456
页数:13
相关论文
共 35 条
  • [1] [Anonymous], 2015, CORR
  • [2] Awan AA, 2017, ACM SIGPLAN NOTICES, V52, P193, DOI [10.1145/3018743.3018769, 10.1145/3155284.3018769]
  • [3] Chen Chia-Yu., 2017, CoRR
  • [4] Chen J., 2016, CORR
  • [5] moDNN: Memory Optimal Deep Neural Network Training on Graphics Processing Units
    Chen, Xiaoming
    Chen, Danny Ziyi
    Han, Yinhe
    Hu, Xiaobo Sharon
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (03) : 646 - 661
  • [6] Chen ZL, 2017, IEEE INT VEH SYM, P1856, DOI 10.1109/IVS.2017.7995975
  • [7] Chilimbi T., 2014, 11 USENIX S OP SYST, P571, DOI DOI 10.1108/01439911111122716
  • [8] Das D., 2016, Distributed deep learning using synchronous stochastic gradient descent
  • [9] Dean Jeffrey, 2012, Advances in neural information processing systems (NeurIPS 2012), V25
  • [10] swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight
    Fang, Jiarui
    Fu, Haohuan
    Zhao, Wenlai
    Chen, Bingwei
    Zheng, Weijie
    Yang, Guangwen
    [J]. 2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 615 - 624