Understanding the Performance of GPGPU Applications from a Data-Centric View

被引:8
作者
Zhang, Hui [1 ]
Hollingsworth, Jeffrey K. [2 ]
机构
[1] Samsung Semicond Inc, Memory Solut Lab, San Jose, CA 95134 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA
来源
PROCEEDINGS OF PROTOOLS 2019: 2019 IEEE/ACM INTERNATIONAL WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS) | 2019年
关键词
Data-Centric; CUDA; Benchmark optimization; Performance evaluation; GPGPU; Heterogeneous architectures;
D O I
10.1109/ProTools49597.2019.00006
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Using a CPU-GPU hybrid computing framework is becoming a common configuration for supercomputers. The wide deployment of GPUs (as well as other hardware accelerators) brings to the HPC community a big question: Are we using them effectively? Inappropriate use of GPUs can generate incorrect results in certain cases, but more often, will slow down the program instead of speeding it up. This paper describes a tool that satisfies the needs of programmers to analyze the runtime performance of kernels and obtain insights for better GPU utilization. Compared to existing GPU performance tools, ours provides some unique features: data-centric profiling and generating complete GPU call stacks. With the guidance of the tool, we were able to improve the kernel performance of three widely-studied GPU benchmarks by a factor of up to 46.6x with minor code modification.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 17 条
  • [1] [Anonymous], 2010, P 3 WORKSHOP GEN PUR, DOI [10.1145/1735688.1735702, DOI 10.1145/1735688.1735702]
  • [2] Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors
    Baghsorkhi, Sara S.
    Gelado, Isaac
    Delahaye, Matthieu
    Hwu, Wen-mei W.
    [J]. ACM SIGPLAN NOTICES, 2012, 47 (08) : 23 - 33
  • [3] Chabbi M, 2013, P INT C HIGH PERF CO, P43
  • [4] Parallel programmability and the Chapel language
    Chamberlain, B. L.
    Callahan, D.
    Zima, H. P.
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2007, 21 (03) : 291 - 312
  • [5] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
  • [6] Grauer-Gray S., 2012, 2012 INNOVATIVE PARA
  • [7] Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
    Lim, Robert
    Malony, Allen
    Norris, Boyana
    Chaimov, Nick
    [J]. EURO-PAR 2015: PARALLEL PROCESSING WORKSHOPS, 2015, 9523 : 185 - 196
  • [8] LLVM, The Often Misunderstood GEP Instruction
  • [9] Mosberger David., 2011, The libunwind project
  • [10] Müller MS, 2008, ADV PARALLEL COMPUT, V15, P637