Identifying Optimization Opportunities Within Kernel Execution in GPU Codes

被引:3
作者
Lim, Robert [1 ]
Malony, Allen [1 ]
Norris, Boyana [1 ]
Chaimov, Nick [1 ]
机构
[1] Univ Oregon, High Performance Comp Lab, Performance Res Lab, Eugene, OR 97403 USA
来源
EURO-PAR 2015: PARALLEL PROCESSING WORKSHOPS | 2015年 / 9523卷
关键词
D O I
10.1007/978-3-319-27308-2_16
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Tuning codes for GPGPU architectures is challenging because few performance tools can pinpoint the exact causes of execution bottlenecks. While profiling applications can reveal execution behavior with a particular architecture, the abundance of collected information can also overwhelm the user. Moreover, performance counters provide cumulative values but does not attribute events to code regions, whichmakes identifying performance hot spots difficult. This research focuses on characterizing the behavior of GPU application kernels and its performance at the node level by providing a visualization and metrics display that indicates the behavior of the application with respect to the underlying architecture. We demonstrate the effectiveness of our techniques with LAMMPS and LULESH application case studies on a variety of GPU architectures. By sampling instruction mixes for kernel execution runs, we reveal a variety of intrinsic program characteristics relating to computation, memory and control flow.
引用
收藏
页码:185 / 196
页数:12
相关论文
共 17 条
  • [1] A portable programming interface for performance evaluation on modern processors
    Browne, S
    Dongarra, J
    Garner, N
    Ho, G
    Mucci, P
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2000, 14 (03) : 189 - 204
  • [2] Effective Sampling-Driven Performance Tools for GPU-Accelerated Supercomputers
    Chabbi, Milind
    Murthy, Karthik
    Fagan, Michael
    Mellor-Crummey, John
    [J]. 2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [3] Dietrich R., 2010, 1 INT WORKSH PAR SOF
  • [4] Hartono A., 2009, INT S PAR DISTR PROC
  • [5] Hong S., 2009, ACM SIGARCH Computer Architecture News
  • [6] Karlin I., 2012, TECHNICAL REPORT
  • [7] Kerr A., 2009, INT S WORKL CHAR IIS
  • [8] Kim H., 2012, Performance Analysis and Tuning for General Purpose Graphics Processing Units, V1st
  • [9] Knpfer A., 2008, TOOLS HIGH PERFORMAN
  • [10] Lim R., 2015, TECHNICAL REPORT