Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引:0
|
作者
Zhang, Lin [1 ]
Shi, Shaohuai [2 ]
Wang, Wei [1 ]
Li, Bo [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
关键词
Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;
D O I
10.1109/TCC.2022.3205918
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.
引用
收藏
页码:2365 / 2378
页数:14
相关论文
共 50 条
  • [1] Deep Neural Network Training With Distributed K-FAC
    Pauloski, J. Gregory
    Huang, Lei
    Xu, Weijia
    Chard, Kyle
    Foster, Ian T.
    Zhang, Zhao
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
  • [2] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
    Shi, Shaohuai
    Zhang, Lin
    Li, Bo
    2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
  • [3] SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
    Ahn, Shinyoung
    Lim, Eunji
    IEEE ACCESS, 2020, 8 : 207097 - 207111
  • [4] Defenses Against Byzantine Attacks in Distributed Deep Neural Networks
    Xia, Qi
    Tao, Zeyi
    Li, Qun
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (03): : 2025 - 2035
  • [5] Latent Weight Quantization for Integerized Training of Deep Neural Networks
    Fei, Wen
    Dai, Wenrui
    Zhang, Liang
    Zhang, Luoming
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2816 - 2832
  • [6] DistGNN: Scalable Distributed Training for Large -Scale Graph Neural Networks
    Md, Vasimuddin
    Misra, Sanchit
    Ma, Guixiang
    Mohanty, Ramanarayan
    Georganas, Evangelos
    Heinecke, Alexander
    Kalamkar, Dhiraj
    Ahmed, Nesreen K.
    Avancha, Sasikanth
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [7] Approximate Fisher Information Matrix to Characterize the Training of Deep Neural Networks
    Liao, Zhibin
    Drummond, Tom
    Reid, Ian
    Carneiro, Gustavo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 15 - 26
  • [8] Scalable bio-inspired training of Deep Neural Networks with FastHebb
    Lagani, Gabriele
    Falchi, Fabrizio
    Gennaro, Claudio
    Fassold, Hannes
    Amato, Giuseppe
    NEUROCOMPUTING, 2024, 595
  • [9] A Hitchhiker's Guide On Distributed Training Of Deep Neural Networks
    Chahal, Karanbir Singh
    Grover, Manraj Singh
    Dey, Kuntal
    Shah, Rajiv Ratn
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 137 (137) : 65 - 76
  • [10] An In-Depth Analysis of Distributed Training of Deep Neural Networks
    Ko, Yunyong
    Choi, Kibong
    Seo, Jiwon
    Kim, Sang-Wook
    2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 994 - 1003