Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引:0
|
作者
Zhang, Lin [1 ]
Shi, Shaohuai [2 ]
Wang, Wei [1 ]
Li, Bo [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
关键词
Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;
D O I
10.1109/TCC.2022.3205918
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.
引用
收藏
页码:2365 / 2378
页数:14
相关论文
共 50 条
  • [21] Distributed Neural Networks Training for Robotic Manipulation With Consensus Algorithm
    Liu, Wenxing
    Niu, Hanlin
    Jang, Inmo
    Herrmann, Guido
    Carrasco, Joaquin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2732 - 2746
  • [22] Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs
    You, Yang
    Zhang, Zhao
    Hsieh, Cho-Jui
    Demmel, James
    Keutzer, Kurt
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (11) : 2449 - 2462
  • [23] Deep distributed convolutional neural networks: Universality
    Zhou, Ding-Xuan
    ANALYSIS AND APPLICATIONS, 2018, 16 (06) : 895 - 919
  • [24] Hierarchical Training of Deep Neural Networks Using Early Exiting
    Sepehri, Yamin
    Pad, Pedram
    Yuzuguler, Ahmet Caner
    Frossard, Pascal
    Dunbar, L. Andrea
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [25] Training Deep Photonic Convolutional Neural Networks With Sinusoidal Activations
    Passalis, Nikolaos
    Mourgias-Alexandris, George
    Tsakyridis, Apostolos
    Pleros, Nikos
    Tefas, Anastasios
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03): : 384 - 393
  • [26] GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework
    Duong Thi Thu Van
    Khan, Muhammad Numan
    Afridi, Tariq Habib
    Ullah, Irfan
    Alam, Aftab
    Lee, Young-Koo
    IEEE ACCESS, 2022, 10 : 21684 - 21700
  • [27] BK.Synapse: A scalable distributed training framework for deep learning
    Dinh Viet Sang
    Phan Ngoc Lan
    SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 43 - 48
  • [28] Deployment Service for Scalable Distributed Deep Learning Training on Multiple Clouds
    Jorge, Javier
    Molto, German
    Segrelles, Damian
    Fontes, Joao Pedro
    Guevara, Miguel Angel
    CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2021, : 135 - 142
  • [29] Distributed Deep Neural Network Training on Edge Devices
    Benditkis, Daniel
    Keren, Aviv
    Mor-Yosef, Liron
    Avidor, Tomer
    Shoham, Neta
    Tal-Israel, Nadav
    SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 304 - 306
  • [30] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
    Lim, Eun-Ji
    Ahn, Shin-Young
    Park, Yoo-Mi
    Choi, Wan
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242