Interference-aware parallelization for deep learning workload in GPU cluster

被引:19
作者
Geng, Xin [1 ]
Zhang, Haitao [1 ]
Zhao, Zhengyang [1 ]
Ma, Huadong [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing Key Lab Intelligent Telecomm Software & M, Beijing 100876, Peoples R China
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2020年 / 23卷 / 04期
关键词
Deep learning; Workload parallelization; Deep collaborative filtering; Deep neural networks; Interference aware; NETWORKS;
D O I
10.1007/s10586-019-03037-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the widespread use of GPUs for performing deep learning applications, the issue of efficient execution of multiple deep learning jobs in a GPU cluster has attracted great attention. It becomes more difficult to achieve efficient workloads parallelization since modern GPUs support concurrent execution of multiple jobs. However, traditional coarse-grained scheduling methods without taking into account interference caused by resource contention among co-executing jobs and characteristics of deep learning jobs can lead to unbalanced use of computing resource and further cause the degradation of jobs performance in the GPU cluster. In this paper, we propose a two-stage workload parallelization approach for deep learning training workloads. We firstly propose two interference-aware prediction models including the Interference-Aware Similarity Prediction (IASP) model based on deep collaborative filtering and the Interference-Aware Performance Prediction (IAPP) model based on deep neural network. Our parallelization approach includes both the cluster-level workload parallelization strategy and the node-level workload parallelization strategy. Specifically, the Cluster-Level Workload Parallelization (CLWP) strategy assigns deep learning jobs to appropriate worker node according to the proposed IASP model, and the Node-Level Workload Parallelization (NLWP) strategy places deep learning tasks to appropriate GPUs according to the proposed IAPP model and the communication costs among tasks. We evaluate our deep learning workload parallelization strategy on a prototype platform with other widely used methods. The experimental results show that the proposed strategy can averagely improve the GPU utilization by 18% and reduce the job completion time by around 22%.
引用
收藏
页码:2689 / 2702
页数:14
相关论文
共 38 条
[1]   Topology-Aware GPU Scheduling for Learning Workloads in Cloud Environments [J].
Amaral, Marcelo ;
Polo, Jorda ;
Carrera, David ;
Seelam, Seetharami ;
Steinder, Malgorzata .
SC'17: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2017,
[2]  
[Anonymous], 2018, Highly scalable deep learning training system with mixedprecision: Training imagenet in four minutes
[3]  
[Anonymous], 2009, INT C ART INT STAT, DOI DOI 10.1145/3301282
[4]   Containers and Cloud: From LXC to Docker to Kubernetes [J].
Bernstein, David .
IEEE CLOUD COMPUTING, 2014, 1 (03) :81-84
[5]  
Bottou L, 2012, Stochastic Gradient Descent Tricks, P421, DOI DOI 10.1007/978-3-642-35289-8_25
[6]   GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server [J].
Cui, Henggang ;
Zhang, Hao ;
Ganger, Gregory R. ;
Gibbons, Phillip B. ;
Xing, Eric P. .
PROCEEDINGS OF THE ELEVENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, (EUROSYS 2016), 2016,
[7]  
Dean Jeffrey, 2012, Advances in neural information processing systems, P1223
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[10]  
Erhan D, 2010, J MACH LEARN RES, V11, P625