MD-Roofline: A Training Performance Analysis Model for Distributed Deep Learning

被引：2

作者：

Miao, Tianhao ^{[1
,2
]}

Wu, Qinghua ^{[1
,4
]}

Liu, Ting ^{[1
,2
]}

Cui, Penglai ^{[1
,2
]}

Ren, Rui ^{[1
,2
]}

Li, Zhenyu ^{[1
,4
]}

Xie, Gaogang ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China

[4] Purple Mt Labs, Nanjing, Peoples R China

来源：

2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

Distributed Training Performance; Straggler Diagnosis; Bottleneck Location; Roofline; OPERATIONS;

D O I：

10.1109/ISCC55528.2022.9912757

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it leaves an enormous challenge for AI researchers and operation engineers to analyze, diagnose and locate the performance bottleneck during the training stage. Existing performance models and frameworks gain little insight on the performance reduction that a performance straggler induces. In this paper, we introduce MD-Roofline, a training performance analysis model, which extends the traditional roofline model with communication dimension. The model considers the layer-wise attributes at application level, and a series of achievable peak performance metrics at hardware level. With the assistance of our MD-Roofline, the AI researchers and DDL operation engineers could locate the system bottleneck, which contains three dimensions: intra-GPU computation capacity, intra-GPU memory access bandwidth and inter-GPU communication bandwidth. We demonstrate that our performance analysis model provides great insights in bottleneck analysis when training 12 classic CNNs.

引用

页数：8

共 50 条

[21] Distributed Training of Deep Learning Models: A Taxonomic Perspective
Langer, Matthias
He, Zhen
Rahayu, Wenny
Xue, Yanbo
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (12) : 2802 - 2818
[22] Performance analysis of various training algorithms of deep learning based controller
Prasad, Bhawesh
Kumar, Raj
Singh, Manmohan
ENGINEERING RESEARCH EXPRESS, 2023, 5 (02):
[23] Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device
Liu, Jie
Liu, Jiawen
Du, Wan
Li, Dong
2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 506 - 515
[24] Performance Analysis on Deep Learning Semantic Segmentation with multivariate Training Procedures
Lourenco, Bernardo
Santos, Vitor
Oliveira, Miguel
Almeida, Tiago
2020 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC 2020), 2020, : 89 - 95
[25] A lightweight performance proxy for deep-learning model training on Amazon SageMaker
Tesser, Rafael Keller
Marques, Alvaro
Borin, Edson
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (14):
[26] Performance Analysis of Distributed Deep Learning Frameworks in a Multi-GPU Environment
Kavarakuntla, Tulasi
Han, Liangxiu
Lloyd, Huw
Latham, Annabel
Akintoye, Samson B.
20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 406 - 413
[27] SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning
Ko, Yunyong
Kim, Sang-Wook
APPLIED SCIENCES-BASEL, 2022, 12 (01):
[28] Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective
Liu, Xuanzhe
Gu, Diandian
Chen, Zhenpeng
Wen, Jinfeng
Zhang, Zili
Ma, Yun
Wang, Haoyu
Jin, Xin
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
[29] Distributed Training for Multilingual Combined Tokenizer using Deep Learning Model and Simple Communication Protocol
Purwanto, Christian Nathaniel
Santoso, Joan
Hermawan, Arya Tandy
Gunawan
2019 1ST INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEM (ICORIS), 2019, : 110 - 113
[30] Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model
Kim, Ki-Hwan
Kim, KyoungHo
Park, Q-Han
COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (06) : 1201 - 1207

← 1 2 3 4 5 →