A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications

被引:70
作者
Sim, Jaewoong [1 ]
Dasgupta, Aniruddha
Kim, Hyesoon [1 ]
Vuduc, Richard [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Measurement; Performance; CUDA; GPGPU architecture; Analytical model; Performance benefit prediction; Performance prediction;
D O I
10.1145/2370036.2145819
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications, and it is often difficult to map these metrics to understand the root causes of slowdowns, much less decide what next optimization step to take to alleviate the bottleneck. In our approach, we first develop an analytical performance model that can precisely predict performance and aims to provide programmer-interpretable metrics. Then, we apply static and dynamic profiling to instantiate our performance model for a particular input code and show how the model can predict the potential performance benefits. We demonstrate our framework on a suite of micro-benchmarks as well as a variety of computations extracted from real codes.
引用
收藏
页码:11 / 21
页数:11
相关论文
共 19 条
  • [1] [Anonymous], PACT 19
  • [2] [Anonymous], PARB BENCHM SUIT
  • [3] Baghsorkhi S., 2010, PPOPP
  • [4] Bakhoda A., 2009, IEEE ISPASS APR
  • [5] Choi J.W., 2010, PPOPP
  • [6] Collange S., 2010, MOD AN SIM COMP SYST, V0, P351
  • [7] Dotsenko Y, 2011, P 16 ACM S PRINC PRA
  • [8] A FAST ALGORITHM FOR PARTICLE SIMULATIONS
    GREENGARD, L
    ROKHLIN, V
    [J]. JOURNAL OF COMPUTATIONAL PHYSICS, 1987, 73 (02) : 325 - 348
  • [9] Hong S., 2009, ISCA
  • [10] Kim Y., 2011, DAC 11