Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs

被引：8

作者：

Pang, Weiguang ^{[1
]}

Luo, Xiantong ^{[1
]}

Chen, Kailun ^{[1
]}

Ji, Dong ^{[2
]}

Qiao, Lei ^{[3
]}

Yi, Wang ^{[1
,4
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] Northeastern Univ, Natl Frontiers Sci Ctr Ind Intelligence & Syst Opt, Shenyang 110819, Peoples R China

[3] Beijing Inst Control Engn, Beijing, Peoples R China

[4] Uppsala Univ, Uppsala, Sweden

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2023年 / 139卷

关键词：

DNN; Real-time scheduling; GPU; CUDA stream priority;

D O I：

10.1016/j.sysarc.2023.102888

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep Neural Networks (DNNs) are widely used in Cyber-Physical Systems (CPS) that often involve multiple DNN tasks with varying real-time requirements. These tasks need to be deployed on a single embedded hardware platform with limited resources, such as an embedded GPU. Efficiently sharing the same embedded GPU among multiple real-time DNN tasks is a complex challenge. While existing DNN frameworks (e.g., PyTorch and TensorFlow) focus on maximizing average performance and high throughput on GPU, they lack scheduling management mechanisms considering multiple DNNs with different timing requirements. In this paper, we address this challenge by thoroughly examining and summarizing the scheduling rules for multiple kernels with different priorities in CUDA streams. Based on these rules, we design a framework that supports multi-DNN real-time inference and propose a method for allocating CUDA streams to DNN kernels to meet schedulability requirements while maximizing GPU resource utilization. Our proposed approach is implemented on an NVIDIA Jetson AGX Xavier embedded GPU system and validated using several popular DNNs. The results show that our approach achieves shorter response times compared with several state-of-the-art methods.

引用

页数：11

共 29 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2] GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed [J].

Amert, Tanya ;

Otterness, Nathan ;

Yang, Ming ;

Anderson, James H. ;

Smith, F. Donelson .

2017 IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS), 2017, :104-115

[3] ApNet: Approximation-aware Real-Time Neural Network [J].

Bateni, Soroush ;

Liu, Cong .

2018 39TH IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS 2018), 2018, :67-79

[4] Measuring the performance of schedulability tests [J].

Bini, E ;

Buttazzo, GC .

REAL-TIME SYSTEMS, 2005, 30 (1-2) :129-153

[5]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[6]

Bojarski M, 2016, Arxiv, DOI [arXiv:1604.07316, DOI 10.48550/ARXIV.1604.07316]

[7]

Goldie A., 2018, P 6 INT C LEARNING R, P1

[8] Real-Time Object Detection System with Multi-Path Neural Networks [J].

Heo, Seonyeong ;

Cho, Sungjun ;

Kim, Youngsok ;

Kim, Hanjun .

2020 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2020), 2020, :174-187

[9]

Kang Woosung, 2021, 2021 IEEE Real-Time Systems Symposium (RTSS), P329, DOI 10.1109/RTSS52674.2021.00038

[10] ImageNet Classification with Deep Convolutional Neural Networks [J].

Krizhevsky, Alex ;

Sutskever, Ilya ;

Hinton, Geoffrey E. .

COMMUNICATIONS OF THE ACM, 2017, 60 (06) :84-90

← 1 2 3 →