Training deep neural networks: a static load balancing approach

被引：11

作者：

Moreno-Alvarez, Sergio ^{[1
]}

Haut, Juan M. ^{[2
]}

Paoletti, Mercedes E. ^{[2
]}

Rico-Gallego, Juan A. ^{[1
]}

Diaz-Martin, Juan C. ^{[2
]}

Plaza, Javier ^{[2
]}

机构：

[1] Univ Extremadura, Dept Comp Syst Engn & Telemat, Caceres, Spain

[2] Univ Extremadura, Dept Technol Comp & Commun, Caceres, Spain

来源：

JOURNAL OF SUPERCOMPUTING | 2020年 / 76卷 / 12期

关键词：

Deep learning; High-performance computing; Distributed training; Heterogeneous platforms;

D O I：

10.1007/s11227-020-03200-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

引用

页码：9739 / 9754

页数：16

共 23 条

[1] Matrix multiplication on heterogeneous platforms
Beaumont, O
Boudet, V
Rastello, F
Robert, Y
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, 12 (10) : 1033 - 1051
[2] Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
Ben-Nun, Tal
Hoefler, Torsten
[J]. ACM COMPUTING SURVEYS, 2019, 52 (04)
[3] Fast Distributed Deep Learning via Worker-adaptive Batch Sizing
Chen, Chen
Weng, Qizhen
Wang, Wei
Li, Baochun
Li, Bo
[J]. PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 521 - 521
[4] Chen J, 2016, 2016 8TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS)
[5] Chiu C.C, 2017, ARXIV171201769
[6] Clarke D, 2013, LECT NOTES COMPUT SC, V7979, P182, DOI 10.1007/978-3-642-39958-9_16
[7] Dean Jeffrey, 2012, Advances in neural information processing systems (NeurIPS 2012), V25
[8] Fox Geoffrey, 2016, Big Data Benchmarking. 6th International Workshop, WBDB2015.ca, and 7th International Workshop, WBDB2015.in. Revised Selected Papers: LNCS 10044, P3, DOI 10.1007/978-3-319-49748-8_1
[9] Gupta S, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4854
[10] He K., 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90

← 1 2 3 →