Empirical Performance Evaluation of Communication Libraries for Multi-GPU based Distributed Deep Learning in a Container Environment

被引：1

作者：

Choi, HyeonSeong ^{[1
]}

Kim, Youngrang ^{[2
]}

Lee, Jaehwan ^{[3
]}

Kim, Yoonhee ^{[4
]}

机构：

[1] Korea Aerosp Univ, KAU, Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea

[2] Korea Aerosp Univ, Goyang City, Gyeonggi Do, South Korea

[3] Korea Aerosp Univ, Dept Elect & Informat Engn, Goyang City, Gyeonggi Do, South Korea

[4] Sookmyung Womens Univ, Comp Sci Dept, Seoul, South Korea

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2021年 / 15卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

Docker; Collective Communication; Distributed Deep Leaning; Multi-GPU; MPI;

D O I：

10.3837/tiis.2021.03.006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, most cloud services use Docker container environment to provide their services. However, there are no researches to evaluate the performance of communication libraries for multi-GPU based distributed deep learning in a Docker container environment. In this paper, we propose an efficient communication architecture for multi-GPU based deep learning in a Docker container environment by evaluating the performances of various communication libraries. We compare the performances of the parameter server architecture and the All reduce architecture, which are typical distributed deep learning architectures. Further, we analyze the performances of two separate multi-GPU resource allocation policies - allocating a single GPU to each Docker container and allocating multiple GPUs to each Docker container. We also experiment with the scalability of collective communication by increasing the number of GPUs from one to four. Through experiments, we compare OpenMPI and MPICH, which are representative open source MPI libraries, and NCCL, which is NVIDIA's collective communication library for the multi-GPU setting. In the parameter server architecture, we show that using CUDA-aware OpenMPI with multi-GPU per Docker container environment reduces communication latency by up to 75%. Also, we show that using NCCL in All-reduce architecture reduces communication latency by up to 93% compared to other libraries.

引用

页码：911 / 931

页数：21

共 18 条

[1] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
Rolinger, Thomas B.
Simon, Tyler A.
Krieger, Christopher D.
2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
[2] Empirical Performance Analysis of Collective Communication for Distributed Deep Learning in a Many-Core CPU Environment
Woo, Junghoon
Choi, Hyeonseong
Lee, Jaehwan
APPLIED SCIENCES-BASEL, 2020, 10 (19):
[3] Efficient Multi-GPU Memory Management for Deep Learning Acceleration
Kim, Youngrang
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 37 - 43
[4] Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power Capping
Koszczal, Grzegorz
Dobrosolski, Jan
Matuszek, Mariusz
Czarnul, Pawel
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 5 - 16
[5] Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration
Youngrang Kim
Jaehwan Lee
Jik-Soo Kim
Hyunseung Jei
Hongchan Roh
Cluster Computing, 2020, 23 : 2193 - 2204
[6] Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration
Kim, Youngrang
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2193 - 2204
[7] AEML: An Acceleration Engine for Multi-GPU Load-Balancing in Distributed Heterogeneous Environment
Tang, Zhuo
Du, Lifan
Zhang, Xuedong
Yang, Li
Li, Kenli
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (06) : 1344 - 1357
[8] Collective Communication Performance Evaluation for Distributed Deep Learning Training
Lee, Sookwang
Lee, Jaehwan
APPLIED SCIENCES-BASEL, 2024, 14 (12):
[9] Parallel Computing Model and Performance Prediction based on Multi-GPU Environments
Wang, Zhuowei
Xu, Xianbin
Zhao, Wuqing
2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTERS IN EDUCATION (ICFCE 2011), VOL I, 2011, : 309 - 312
[10] Multi-GPU Server Deign Parameters Selection based on Empirical Observation of HPL Behavior
Kim, Young Woo
2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,

← 1 2 →