Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing

被引:20
作者
Kim, Jiho [1 ]
Cha, Jehee [1 ]
Park, Jason Jong Kyu [2 ]
Jeon, Dongsuk [3 ]
Park, Yongjun [4 ]
机构
[1] Hongik Univ, Seoul 04066, South Korea
[2] Univ Michigan, Ann Arbor, MI 48109 USA
[3] Seoul Natl Univ, Seoul 151742, South Korea
[4] Hanyang Univ, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
Computer architecture; GPUs; multi-programmed; resource sharing; spatial multitasking;
D O I
10.1109/LCA.2018.2889042
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As GPUs have become essential components for embedded computing systems, a shared GPU with multiple CPU cores needs to efficiently support concurrent execution of multiple different applications. Spatial multitasking, which assigns a different amount of streaming multiprocessors (SMs) to multiple applications, is one of the most common solutions for this. However, this is not a panacea for maximizing total resource utilization. It is because an SM consists of many different sub-resources such as caches, execution units and scheduling units, and the requirements of the sub-resources per kernel are not well matched to their fixed sizes inside an SM. To solve the resource requirement mismatch problem, this paper proposes a GPU Weaver, a dynamic sub-resource management system of multitasking GPUs. GPU Weaver can maximize sub-resource utilization through a shared resource controller (SRC) that is added between neighboring SMs. The SRC dynamically identifies idle sub-resources of an SM and allows them to be used by the neighboring SM when possible. Experiments show that the combination of multiple sub-resource borrowing techniques enhances the total throughput by up to 26 and 9.5 percent on average over the baseline spatial multitasking GPU.
引用
收藏
页码:1 / 5
页数:5
相关论文
共 19 条
[1]  
Adriaens JT, 2012, INT S HIGH PERF COMP, P79
[2]  
[Anonymous], SHAR GPU MPI PROC MU
[3]  
[Anonymous], 2010, OPENCL OP STAND PAR
[4]  
Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648
[5]   BULLDOZER: AN APPROACH TO MULTITHREADED COMPUTE PERFORMANCE [J].
Butler, Michael ;
Barnes, Leslie ;
Das Sarma, Debjit ;
Gelinas, Bob .
IEEE MICRO, 2011, 31 (02) :6-15
[6]  
Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
[7]   Mars: A MapReduce Framework on Graphics Processors [J].
He, Bingsheng ;
Fang, Wenbin ;
Luo, Qiong ;
Govindaraju, Naga K. ;
Wang, Tuyong .
PACT'08: PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2008, :260-269
[8]   Efficient GPU multitasking with latency minimization and cache boosting [J].
Kim, Jiho ;
Chu, Minsung ;
Park, Yongjun .
IEICE ELECTRONICS EXPRESS, 2017, 14 (07)
[9]  
Kumar R, 2004, INT SYMP MICROARCH, P195
[10]  
Kyu Park JasonJong., 2017, Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, P527