MD-Roofline: A Training Performance Analysis Model for Distributed Deep Learning

被引:2
|
作者
Miao, Tianhao [1 ,2 ]
Wu, Qinghua [1 ,4 ]
Liu, Ting [1 ,2 ]
Cui, Penglai [1 ,2 ]
Ren, Rui [1 ,2 ]
Li, Zhenyu [1 ,4 ]
Xie, Gaogang [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[4] Purple Mt Labs, Nanjing, Peoples R China
来源
2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022) | 2022年
基金
中国国家自然科学基金;
关键词
Distributed Training Performance; Straggler Diagnosis; Bottleneck Location; Roofline; OPERATIONS;
D O I
10.1109/ISCC55528.2022.9912757
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it leaves an enormous challenge for AI researchers and operation engineers to analyze, diagnose and locate the performance bottleneck during the training stage. Existing performance models and frameworks gain little insight on the performance reduction that a performance straggler induces. In this paper, we introduce MD-Roofline, a training performance analysis model, which extends the traditional roofline model with communication dimension. The model considers the layer-wise attributes at application level, and a series of achievable peak performance metrics at hardware level. With the assistance of our MD-Roofline, the AI researchers and DDL operation engineers could locate the system bottleneck, which contains three dimensions: intra-GPU computation capacity, intra-GPU memory access bandwidth and inter-GPU communication bandwidth. We demonstrate that our performance analysis model provides great insights in bottleneck analysis when training 12 classic CNNs.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Distributed Training of Deep Learning Models: A Taxonomic Perspective
    Langer, Matthias
    He, Zhen
    Rahayu, Wenny
    Xue, Yanbo
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (12) : 2802 - 2818
  • [22] Performance analysis of various training algorithms of deep learning based controller
    Prasad, Bhawesh
    Kumar, Raj
    Singh, Manmohan
    ENGINEERING RESEARCH EXPRESS, 2023, 5 (02):
  • [23] Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device
    Liu, Jie
    Liu, Jiawen
    Du, Wan
    Li, Dong
    2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 506 - 515
  • [24] Performance Analysis on Deep Learning Semantic Segmentation with multivariate Training Procedures
    Lourenco, Bernardo
    Santos, Vitor
    Oliveira, Miguel
    Almeida, Tiago
    2020 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC 2020), 2020, : 89 - 95
  • [25] A lightweight performance proxy for deep-learning model training on Amazon SageMaker
    Tesser, Rafael Keller
    Marques, Alvaro
    Borin, Edson
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (14):
  • [26] Performance Analysis of Distributed Deep Learning Frameworks in a Multi-GPU Environment
    Kavarakuntla, Tulasi
    Han, Liangxiu
    Lloyd, Huw
    Latham, Annabel
    Akintoye, Samson B.
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 406 - 413
  • [27] SHAT: A Novel Asynchronous Training Algorithm That Provides Fast Model Convergence in Distributed Deep Learning
    Ko, Yunyong
    Kim, Sang-Wook
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [28] Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective
    Liu, Xuanzhe
    Gu, Diandian
    Chen, Zhenpeng
    Wen, Jinfeng
    Zhang, Zili
    Ma, Yun
    Wang, Haoyu
    Jin, Xin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
  • [29] Distributed Training for Multilingual Combined Tokenizer using Deep Learning Model and Simple Communication Protocol
    Purwanto, Christian Nathaniel
    Santoso, Joan
    Hermawan, Arya Tandy
    Gunawan
    2019 1ST INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEM (ICORIS), 2019, : 110 - 113
  • [30] Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model
    Kim, Ki-Hwan
    Kim, KyoungHo
    Park, Q-Han
    COMPUTER PHYSICS COMMUNICATIONS, 2011, 182 (06) : 1201 - 1207