Machine Learning at the Edge: Efficient Utilization of Limited CPU/GPU Resources by Multiplexing

被引：9

作者：

Dhakal, Aditya ^{[1
]}

Kulkarni, Sameer G. ^{[2
]}

Ramakrishnan, K. K. ^{[1
]}

机构：

[1] Univ Calif Riverside, Riverside, CA 92521 USA

[2] Indian Inst Technol, Gandhinagar, Gujarat, India

来源：

2020 IEEE 28TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (IEEE ICNP 2020) | 2020年

关键词：

GPU; Machine Learning; Deep Neural Networks; Inference;

D O I：

10.1109/icnp49622.2020.9259361

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Edge clouds can provide very responsive services for end-user devices that require more significant compute capabilities than they have. But edge cloud resources such as CPUs and accelerators such as GPUs are limited and must be shared across multiple concurrently running clients. However, multiplexing GPUs across applications is challenging. Further, edge servers are likely to require considerable amounts of streaming data to be processed. Getting that data from the network stream to the GPU can be a bottleneck, limiting the amount of work GPUs do. Finally, the lack of prompt notification of job completion from GPU also results in ineffective GPU utilization. We propose a framework that addresses these challenges in the following manner. We utilize spatial sharing of GPUs to multiplex the GPU more efficiently. While spatial sharing of GPU can increase GPU utilization, the uncontrolled spatial sharing currently available with state-of-the-art systems such as CUDA-MPS can cause interference between applications, resulting in unpredictable latency. Our framework utilizes controlled spatial sharing of GPU, which limits the interference across applications. Our framework uses the GPU DMA engine to offload data transfer to GPU, therefore preventing CPU from being bottleneck while transferring data from the network to GPU. Our framework uses the CUDA event library to have timely, low overhead GPU notifications. Preliminary experiments show that we can achieve low DNN inference latency and improve DNN inference throughput by a factor of similar to 1.4.

引用

页数：6

共 20 条

[1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2] [Anonymous], 2020, TorchServe
[3] [Anonymous], 2020, NVIDIA HYPER Q
[4] [Anonymous], 2018, USENIX WORKSH HOT TO
[5] [Anonymous], 2011, NVIDIA UNIVERSAL VIR
[6] [Anonymous], BIGLEARN NIPS WORKSH
[7] Chang H, 2014, IEEE CONF COMPUT, P346, DOI 10.1109/INFCOMW.2014.6849256
[8] Deep Learning With Edge Computing: A Review
Chen, Jiasi
Ran, Xukan
[J]. PROCEEDINGS OF THE IEEE, 2019, 107 (08) : 1655 - 1674
[9] High Prevalence of Assisted Injection Among Street-Involved Youth in a Canadian Setting
Cheng, Tessa
Kerr, Thomas
Small, Will
Dong, Huiru
Montaner, Julio
Wood, Evan
DeBeck, Kora
[J]. AIDS AND BEHAVIOR, 2016, 20 (02) : 377 - 384
[10] Dhakal A, 2019, PROCEEDINGS OF THE 2019 IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT 2019), P396, DOI [10.1109/netsoft.2019.8806698, 10.1109/NETSOFT.2019.8806698]

← 1 2 →