Understanding the Performance of GPGPU Applications from a Data-Centric View

被引：8

作者：

Zhang, Hui ^{[1
]}

Hollingsworth, Jeffrey K. ^{[2
]}

机构：

[1] Samsung Semicond Inc, Memory Solut Lab, San Jose, CA 95134 USA

[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA

来源：

PROCEEDINGS OF PROTOOLS 2019: 2019 IEEE/ACM INTERNATIONAL WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS) | 2019年

关键词：

Data-Centric; CUDA; Benchmark optimization; Performance evaluation; GPGPU; Heterogeneous architectures;

D O I：

10.1109/ProTools49597.2019.00006

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Using a CPU-GPU hybrid computing framework is becoming a common configuration for supercomputers. The wide deployment of GPUs (as well as other hardware accelerators) brings to the HPC community a big question: Are we using them effectively? Inappropriate use of GPUs can generate incorrect results in certain cases, but more often, will slow down the program instead of speeding it up. This paper describes a tool that satisfies the needs of programmers to analyze the runtime performance of kernels and obtain insights for better GPU utilization. Compared to existing GPU performance tools, ours provides some unique features: data-centric profiling and generating complete GPU call stacks. With the guidance of the tool, we were able to improve the kernel performance of three widely-studied GPU benchmarks by a factor of up to 46.6x with minor code modification.

引用

页码：1 / 8

页数：8

共 17 条

[1] [Anonymous], 2010, P 3 WORKSHOP GEN PUR, DOI [10.1145/1735688.1735702, DOI 10.1145/1735688.1735702]
[2] Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors
Baghsorkhi, Sara S.
Gelado, Isaac
Delahaye, Matthieu
Hwu, Wen-mei W.
[J]. ACM SIGPLAN NOTICES, 2012, 47 (08) : 23 - 33
[3] Chabbi M, 2013, P INT C HIGH PERF CO, P43
[4] Parallel programmability and the Chapel language
Chamberlain, B. L.
Callahan, D.
Zima, H. P.
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) : 291 - 312
[5] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
[6] Grauer-Gray S., 2012, 2012 INNOVATIVE PARA
[7] Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
Lim, Robert
Malony, Allen
Norris, Boyana
Chaimov, Nick
[J]. EURO-PAR 2015: PARALLEL PROCESSING WORKSHOPS, 2015, 9523 : 185 - 196
[8] LLVM, The Often Misunderstood GEP Instruction
[9] Mosberger David., 2011, The libunwind project
[10] Müller MS, 2008, ADV PARALLEL COMPUT, V15, P637

← 1 2 →