Multi-level Analysis of GPU Utilization in ML Training Workloads

被引：0

作者：

Delestrac, Paul ^{[1
]}

Battacharjee, Debjyoti ^{[2
]}

Yang, Simei ^{[2
]}

Moolchandani, Diksha ^{[2
]}

Catthoor, Francky ^{[2
,3
]}

Torres, Lionel ^{[1
]}

Novo, David ^{[1
]}

机构：

[1] Univ Montpellier, CNRS, LIRMM, Montpellier, France

[2] IMEC, Leuven, Belgium

[3] Katholieke Univ Leuven, Leuven, Belgium

来源：

2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE | 2024年

关键词：

D O I：

10.23919/DATE58400.2024.10546769

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Training time has become a critical bottleneck due to the recent proliferation of large-parameter ML models. GPUs continue to be the prevailing architecture for training ML models. However, the complex execution flow of ML frameworks makes it difficult to understand GPU computing resource utilization. Our main goal is to provide a better understanding of how efficiently ML training workloads use the computing resources of modern GPUs. To this end, we first describe an ideal reference execution of a GPU-accelerated ML training loop and identify relevant metrics that can be measured using existing profiling tools. Second, we produce a coherent integration of the traces obtained from each profiling tool. Third, we leverage the metrics within our integrated trace to analyze the impact of different software optimizations (e.g., mixed-precision, various ML frameworks, and execution modes) on the throughput and the associated utilization at multiple levels of hardware abstraction (i.e., whole GPU, SM subpartitions, issue slots, and tensor cores). In our results on two modern GPUs, we present seven takeaways and show that although close to 100% utilization is generally achieved at the GPU level, average utilization of the issue slots and tensor cores always remains below 50% and 5.2%, respectively.

引用

页数：6

共 50 条

[1] Multi-level graph layout on the GPU
Frishman, Yaniv
Tal, Ayellet
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2007, 13 (06) : 1310 - 1317
[2] Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
Jeon, Myeongjae
Venkataraman, Shivaram
Phanishayee, Amar
Qian, Junjie
Xiao, Wencong
Yang, Fan
PROCEEDINGS OF THE 2019 USENIX ANNUAL TECHNICAL CONFERENCE, 2019, : 947 - 960
[3] Multi-level analysis
Lydersen, Stian
TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2024, 144 (12)
[4] MLPPI Wizard: An Automated Multi-level Partitioning Tool on Analytical Workloads
Suh, Young-Kyoon
Crolotte, Alain
Kostamaa, Pekka
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (04): : 1693 - 1713
[5] Multi-Level Control and Utilization of Stormwater Runoff
Zuo, Yuhang
Luo, Hui
Song, Mingzhi
He, Baojie
Cai, Bingxin
Zhang, Wenhao
Yang, Mingyu
APPLIED SCIENCES-BASEL, 2022, 12 (17):
[6] Multi-level Grid Based Clustering and GPU Parallel Implementations
Qian, Quan
Zhao, Shuai
Xiao, Chao-Jie
Hung, Che-Lun
2017 14TH INTERNATIONAL SYMPOSIUM ON PERVASIVE SYSTEMS, ALGORITHMS AND NETWORKS & 2017 11TH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY & 2017 THIRD INTERNATIONAL SYMPOSIUM OF CREATIVE COMPUTING (ISPAN-FCST-ISCC), 2017, : 397 - 402
[7] Analysis of Crypto-Ransomware Using ML-Based Multi-Level Profiling
Poudyal, Subash
Dasgupta, Dipankar
IEEE ACCESS, 2021, 9 : 122532 - 122547
[8] Scalable State Space Search on the GPU with Multi-Level Parallelism
Shipovalov, Egor
Pryanichnikov, Valentin
2020 19TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2020), 2020, : 84 - 92
[9] Multi-level parallelism for incompressible flow computations on GPU clusters
Jacobsen, Dana A.
Senocak, Inanc
PARALLEL COMPUTING, 2013, 39 (01) : 1 - 20
[10] A Multi-Level Approach to Link State: ML-OLSR
Adjih, Cedric
Plesse, Thierry
PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYMPOSIUM ON MOBILITY MANAGEMENT AND WIRELESS ACCESS, 2012, : 87 - 96

← 1 2 3 4 5 →