Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing

被引：20

作者：

Kim, Jiho ^{[1
]}

Cha, Jehee ^{[1
]}

Park, Jason Jong Kyu ^{[2
]}

Jeon, Dongsuk ^{[3
]}

Park, Yongjun ^{[4
]}

机构：

[1] Hongik Univ, Seoul 04066, South Korea

[2] Univ Michigan, Ann Arbor, MI 48109 USA

[3] Seoul Natl Univ, Seoul 151742, South Korea

[4] Hanyang Univ, Seoul 04763, South Korea

来源：

IEEE COMPUTER ARCHITECTURE LETTERS | 2019年 / 18卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

Computer architecture; GPUs; multi-programmed; resource sharing; spatial multitasking;

D O I：

10.1109/LCA.2018.2889042

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As GPUs have become essential components for embedded computing systems, a shared GPU with multiple CPU cores needs to efficiently support concurrent execution of multiple different applications. Spatial multitasking, which assigns a different amount of streaming multiprocessors (SMs) to multiple applications, is one of the most common solutions for this. However, this is not a panacea for maximizing total resource utilization. It is because an SM consists of many different sub-resources such as caches, execution units and scheduling units, and the requirements of the sub-resources per kernel are not well matched to their fixed sizes inside an SM. To solve the resource requirement mismatch problem, this paper proposes a GPU Weaver, a dynamic sub-resource management system of multitasking GPUs. GPU Weaver can maximize sub-resource utilization through a shared resource controller (SRC) that is added between neighboring SMs. The SRC dynamically identifies idle sub-resources of an SM and allows them to be used by the neighboring SM when possible. Experiments show that the combination of multiple sub-resource borrowing techniques enhances the total throughput by up to 26 and 9.5 percent on average over the baseline spatial multitasking GPU.

引用

页码：1 / 5

页数：5

共 19 条

[1]

Adriaens JT, 2012, INT S HIGH PERF COMP, P79

[2]

[Anonymous], SHAR GPU MPI PROC MU

[3]

[Anonymous], 2010, OPENCL OP STAND PAR

[4]

Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648

[5] BULLDOZER: AN APPROACH TO MULTITHREADED COMPUTE PERFORMANCE [J].

Butler, Michael ;

Barnes, Leslie ;

Das Sarma, Debjit ;

Gelinas, Bob .

IEEE MICRO, 2011, 31 (02) :6-15

[6]

Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797

[7] Mars: A MapReduce Framework on Graphics Processors [J].

He, Bingsheng ;

Fang, Wenbin ;

Luo, Qiong ;

Govindaraju, Naga K. ;

Wang, Tuyong .

PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :260-269

[8] Efficient GPU multitasking with latency minimization and cache boosting [J].

Kim, Jiho ;

Chu, Minsung ;

Park, Yongjun .

IEICE ELECTRONICS EXPRESS, 2017, 14 (07)

[9]

Kumar R, 2004, INT SYMP MICROARCH, P195

[10]

Kyu Park JasonJong., 2017, Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, P527

← 1 2 →