Understanding Distributed Deep Learning Performance by Correlating HPC and Machine Learning Measurements

被引:1
作者
Veroneze Solorzano, Ana Luisa [1 ]
Schnorr, Lucas Mello [1 ]
机构
[1] Informat Inst PPGC UFRGS, Porto Alegre, RS, Brazil
来源
HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2022 | 2022年 / 13289卷
关键词
Distributed Deep Learning; Performance analysis; HPC;
D O I
10.1007/978-3-031-07312-0_14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Frameworks for Distributed Deep Learning (DDL) have become popular alternatives to distribute training by adding a few lines of code to a single-node script. From a High-Performance Computing (HPC) perspective, traditional profiling tools for researches in Machine Learning (ML) fail to expose details about distributed training performance, such as identifying synchronization points, communication and computing time, and devices usage throughout the training. Moreover, these results are usually considered independently. We present a methodology for performance analysis of DDL frameworks that combines HPC and ML tools to apply intrusive and non-intrusive tracing to enrich the findings for a strong scaling in three clusters with different GPU models. We selected two modern DDL frameworks: Horovod and Tarantella. Using spatial and temporal analysis, we identify bottlenecks in the frameworks, such as a long initialization time for Horovod, the non-distribution of data during the testing phase for Tarantella. We extract performance measurements using temporal aggregation considering the training phases, which can benefit DDL frameworks' developers to improve their tools. Horovod presented the best scaling efficiency for 4 GPUs or more, with up to 84.6% scaling efficiency for 4 GPUs and large batch size, while Tarantella achieves 54.7% for the same case. Using our temporal aggregation approach, we identified this result origins from Horovod processing an epoch faster than Tarantella.
引用
收藏
页码:275 / 292
页数:18
相关论文
共 31 条
[1]  
Abadi M., 2016, arXiv, DOI DOI 10.48550/ARXIV.1603.04467
[2]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[3]  
[Anonymous], 2013, INT C PGAS PROGR MOD
[4]  
[Anonymous], 2018, DEEP LEARNING DAY SI
[5]  
[Anonymous], ART COMPUTER SYSTEMS
[6]  
[Anonymous], 2020, PYTH PROF
[7]  
Cappello F, 2005, 2005 6TH INTERNATIONAL WORKSHOP ON GRID COMPUTING (GRID), P99
[8]  
Competence Center for HPC, 2020, TAR DISTR DEEP LEARN
[9]  
Cunha RLD, 2018, INT SYM COMP ARCHIT, P306, DOI [10.1109/CAHPC.2018.8645881, 10.1109/SBAC-PAD.2018.00057]
[10]   BigDL: A Distributed Deep Learning Framework for Big Data [J].
Dai, Jason ;
Wang, Yiheng ;
Qiu, Xin ;
Ding, Ding ;
Zhang, Yao ;
Wang, Yanzhang ;
Jia, Xianyan ;
Zhang, Cherry ;
Wan, Yan ;
Li, Zhichao ;
Wang, Jiao ;
Huang, Shengsheng ;
Wu, Zhongyuan ;
Wang, Yang ;
Yang, Yuhao ;
She, Bowen ;
Shi, Dongjie ;
Lu, Qi ;
Huang, Kai ;
Song, Guoqiong .
PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, :50-60