MD-Roofline: A Training Performance Analysis Model for Distributed Deep Learning

被引:2
|
作者
Miao, Tianhao [1 ,2 ]
Wu, Qinghua [1 ,4 ]
Liu, Ting [1 ,2 ]
Cui, Penglai [1 ,2 ]
Ren, Rui [1 ,2 ]
Li, Zhenyu [1 ,4 ]
Xie, Gaogang [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[4] Purple Mt Labs, Nanjing, Peoples R China
来源
2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022) | 2022年
基金
中国国家自然科学基金;
关键词
Distributed Training Performance; Straggler Diagnosis; Bottleneck Location; Roofline; OPERATIONS;
D O I
10.1109/ISCC55528.2022.9912757
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the bulkiness and sophistication of the Distributed Deep Learning (DDL) systems, it leaves an enormous challenge for AI researchers and operation engineers to analyze, diagnose and locate the performance bottleneck during the training stage. Existing performance models and frameworks gain little insight on the performance reduction that a performance straggler induces. In this paper, we introduce MD-Roofline, a training performance analysis model, which extends the traditional roofline model with communication dimension. The model considers the layer-wise attributes at application level, and a series of achievable peak performance metrics at hardware level. With the assistance of our MD-Roofline, the AI researchers and DDL operation engineers could locate the system bottleneck, which contains three dimensions: intra-GPU computation capacity, intra-GPU memory access bandwidth and inter-GPU communication bandwidth. We demonstrate that our performance analysis model provides great insights in bottleneck analysis when training 12 classic CNNs.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Hierarchical Roofline Performance Analysis for Deep Learning Applications
    Yang, Charlene
    Wang, Yunsong
    Kurth, Thorsten
    Farrell, Steven
    Williams, Samuel
    INTELLIGENT COMPUTING, VOL 2, 2021, 284 : 473 - 491
  • [2] Performance analysis of deep learning workloads using roofline trajectories
    M. Haseeb Javed
    Khaled Z. Ibrahim
    Xiaoyi Lu
    CCF Transactions on High Performance Computing, 2019, 1 : 224 - 239
  • [3] Time-Based Roofline for Deep Learning Performance Analysis
    Wang, Yunsong
    Yang, Charlene
    Farrell, Steven
    Zhang, Yan
    Kurth, Thorsten
    Williams, Samuel
    PROCEEDINGS OF 2020 IEEE/ACM 5TH WORKSHOP ON DEEP LEARNING ON SUPERCOMPUTERS (DLS 2020), 2020, : 10 - 19
  • [4] Performance analysis of deep learning workloads using roofline trajectories
    Javed, M. Haseeb
    Ibrahim, Khaled Z.
    Lu, Xiaoyi
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2019, 1 (3-4) : 224 - 239
  • [5] Performance Analysis of Distributed and Scalable Deep Learning
    Mahon, Sean
    Varrette, Sebastien
    Plugaru, Valentin
    Pinel, Frederic
    Bouvry, Pascal
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 760 - 766
  • [6] Modeling and Optimizing the Scaling Performance in Distributed Deep Learning Training
    Liu, Ting
    Miao, Tianhao
    Wu, Qinghua
    Li, Zhenyu
    He, Guangxin
    Wu, Jiaoren
    Zhang, Shengzhuo
    Yang, Xingwu
    Tyson, Gareth
    Xie, Gaogang
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 1764 - 1773
  • [7] Collective Communication Performance Evaluation for Distributed Deep Learning Training
    Lee, Sookwang
    Lee, Jaehwan
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [8] A Generic Performance Model for Deep Learning in a Distributed Environment
    Kavarakuntla, Tulasi
    Han, Liangxiu
    Lloyd, Huw
    Latham, Annabel
    Kleerekoper, Anthony
    Akintoye, Samson B.
    IEEE ACCESS, 2024, 12 : 8207 - 8219
  • [9] Performance and Consistency Analysis for Distributed Deep Learning Applications
    Jia, Danlin
    Saha, Manoj Pravakar
    Bhimani, Janki
    Mi, Ningfang
    2020 IEEE 39TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2020,
  • [10] Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
    Lin, Zhongyi
    Feng, Louis
    Ardestani, Ehsan K.
    Lee, Jaewon
    Lundell, John
    Kim, Changkyu
    Kejariwal, Arun
    Owens, John D.
    2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC, 2022, : 48 - 58