Multi-level Analysis of GPU Utilization in ML Training Workloads

被引:0
|
作者
Delestrac, Paul [1 ]
Battacharjee, Debjyoti [2 ]
Yang, Simei [2 ]
Moolchandani, Diksha [2 ]
Catthoor, Francky [2 ,3 ]
Torres, Lionel [1 ]
Novo, David [1 ]
机构
[1] Univ Montpellier, CNRS, LIRMM, Montpellier, France
[2] IMEC, Leuven, Belgium
[3] Katholieke Univ Leuven, Leuven, Belgium
来源
2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2024年
关键词
D O I
10.23919/DATE58400.2024.10546769
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Training time has become a critical bottleneck due to the recent proliferation of large-parameter ML models. GPUs continue to be the prevailing architecture for training ML models. However, the complex execution flow of ML frameworks makes it difficult to understand GPU computing resource utilization. Our main goal is to provide a better understanding of how efficiently ML training workloads use the computing resources of modern GPUs. To this end, we first describe an ideal reference execution of a GPU-accelerated ML training loop and identify relevant metrics that can be measured using existing profiling tools. Second, we produce a coherent integration of the traces obtained from each profiling tool. Third, we leverage the metrics within our integrated trace to analyze the impact of different software optimizations (e.g., mixed-precision, various ML frameworks, and execution modes) on the throughput and the associated utilization at multiple levels of hardware abstraction (i.e., whole GPU, SM subpartitions, issue slots, and tensor cores). In our results on two modern GPUs, we present seven takeaways and show that although close to 100% utilization is generally achieved at the GPU level, average utilization of the issue slots and tensor cores always remains below 50% and 5.2%, respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Multi-level graph layout on the GPU
    Frishman, Yaniv
    Tal, Ayellet
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2007, 13 (06) : 1310 - 1317
  • [2] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
    Jeon, Myeongjae
    Venkataraman, Shivaram
    Phanishayee, Amar
    Qian, Junjie
    Xiao, Wencong
    Yang, Fan
    PROCEEDINGS OF THE 2019 USENIX ANNUAL TECHNICAL CONFERENCE, 2019, : 947 - 960
  • [3] Multi-level analysis
    Lydersen, Stian
    TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2024, 144 (12)
  • [4] MLPPI Wizard: An Automated Multi-level Partitioning Tool on Analytical Workloads
    Suh, Young-Kyoon
    Crolotte, Alain
    Kostamaa, Pekka
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (04): : 1693 - 1713
  • [5] Multi-Level Control and Utilization of Stormwater Runoff
    Zuo, Yuhang
    Luo, Hui
    Song, Mingzhi
    He, Baojie
    Cai, Bingxin
    Zhang, Wenhao
    Yang, Mingyu
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [6] Multi-level Grid Based Clustering and GPU Parallel Implementations
    Qian, Quan
    Zhao, Shuai
    Xiao, Chao-Jie
    Hung, Che-Lun
    2017 14TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS AND NETWORKS & 2017 11TH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY & 2017 THIRD INTERNATIONAL SYMPOSIUM OF CREATIVE COMPUTING (ISPAN-FCST-ISCC), 2017, : 397 - 402
  • [7] Analysis of Crypto-Ransomware Using ML-Based Multi-Level Profiling
    Poudyal, Subash
    Dasgupta, Dipankar
    IEEE ACCESS, 2021, 9 : 122532 - 122547
  • [8] Scalable State Space Search on the GPU with Multi-Level Parallelism
    Shipovalov, Egor
    Pryanichnikov, Valentin
    2020 19TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2020), 2020, : 84 - 92
  • [9] Multi-level parallelism for incompressible flow computations on GPU clusters
    Jacobsen, Dana A.
    Senocak, Inanc
    PARALLEL COMPUTING, 2013, 39 (01) : 1 - 20
  • [10] A Multi-Level Approach to Link State: ML-OLSR
    Adjih, Cedric
    Plesse, Thierry
    PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYMPOSIUM ON MOBILITY MANAGEMENT AND WIRELESS ACCESS, 2012, : 87 - 96