Measurement and analysis of GPU-accelerated applications with HPCToolkit

被引:10
|
作者
Zhou, Keren [1 ]
Adhianto, Laksono [1 ]
Anderson, Jonathon [1 ]
Cherian, Aaron [1 ]
Grubisic, Dejan [1 ]
Krentel, Mark [1 ]
Liu, Yumeng [1 ]
Meng, Xiaozhu [1 ]
Mellor-Crummey, John [1 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77005 USA
关键词
Supercomputers; High performance computing; Software performance; Performance analysis;
D O I
10.1016/j.parco.2021.102837
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A Tool for Performance Analysis of GPU-Accelerated Applications
    Zhou, Keren
    Mellor-Crummey, John
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 282 - 282
  • [2] Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs
    Cherian, Aaron Thomas
    Zhou, Keren
    Grubisic, Dejan
    Meng, Xiaozhu
    Mellor-Crummey, John
    PROCEEDINGS OF WORKSHOP ON PROGRAMMING AND PERFORMANCE VISUALIZATION TOOLS (PROTOOLS 2021), 2021, : 26 - 35
  • [3] A Tool for Bottleneck Analysis and Performance Prediction for GPU-accelerated Applications
    Madougou, Souley
    Varbanescu, Ana Lucia
    de Laat, Cees
    van Nieuwpoort, Rob
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 641 - 652
  • [4] Estimating the WCET of GPU-Accelerated Applications using Hybrid Analysis
    Betts, Adam
    Donaldson, Alastair
    PROCEEDINGS OF THE 2013 25TH EUROMICRO CONFERENCE ON REAL-TIME SYSTEMS (ECRTS 2013), 2013, : 193 - 202
  • [5] GPU-accelerated string matching for database applications
    Evangelia A. Sitaridi
    Kenneth A. Ross
    The VLDB Journal, 2016, 25 : 719 - 740
  • [6] A Performance Model for GPU-Accelerated FDTD Applications
    Baumeister, Paul F.
    Hater, Thorsten
    Kraus, Jiri
    Pleiter, Dirk
    Wahl, Pierre
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 185 - 193
  • [7] GPU-Accelerated Static Timing Analysis
    Guo, Zizheng
    Huang, Tsung-Wei
    Lin, Yibo
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [8] GPU-accelerated string matching for database applications
    Sitaridi, Evangelia A.
    Ross, Kenneth A.
    VLDB JOURNAL, 2016, 25 (05): : 719 - 740
  • [9] A tool for top-down performance analysis of GPU-accelerated applications
    Zhou, Keren
    Krentel, Mark
    Mellor-Crummey, John
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 415 - 416
  • [10] An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications
    Zhou, Keren
    Meng, Xiaozhu
    Sai, Ryuichi
    Grubisic, Dejan
    Mellor-Crummey, John
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (04) : 854 - 865