Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引:0
|
作者
Zhang, Lin [1 ]
Shi, Shaohuai [2 ]
Wang, Wei [1 ]
Li, Bo [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
关键词
Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;
D O I
10.1109/TCC.2022.3205918
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.
引用
收藏
页码:2365 / 2378
页数:14
相关论文
共 50 条
  • [41] Improved Highway Network Block for Training Very Deep Neural Networks
    Oyedotun, Oyebade K.
    Shabayek, Abd El Rahman
    Aouada, Djamila
    Ottersten, Bjorn
    IEEE ACCESS, 2020, 8 (08): : 176758 - 176773
  • [42] Training Robust Deep Neural Networks via Adversarial Noise Propagation
    Liu, Aishan
    Liu, Xianglong
    Yu, Hang
    Zhang, Chongzhi
    Liu, Qiang
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5769 - 5781
  • [43] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
    Fadaeddini, Amin
    Majidi, Babak
    Eshghi, Mohammad
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (12) : 10354 - 10368
  • [44] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
    Amin Fadaeddini
    Babak Majidi
    Mohammad Eshghi
    The Journal of Supercomputing, 2020, 76 : 10354 - 10368
  • [45] Sequence-discriminative training of deep neural networks
    Vesely, Karel
    Ghoshal, Arnab
    Burgett, Lukas
    Povey, Daniel
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2344 - 2348
  • [46] A survey on parallel training algorithms for deep neural networks
    Yook, Dongsuk
    Lee, Hyowon
    Yoo, In-Chul
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (06): : 505 - 514
  • [47] Fluorescence microscopy datasets for training deep neural networks
    Hagen, Guy M.
    Bendesky, Justin
    Machado, Rosa
    Tram-Anh Nguyen
    Kumar, Tanmay
    Ventura, Jonathan
    GIGASCIENCE, 2021, 10 (05):
  • [48] An Efficient Optimization Technique for Training Deep Neural Networks
    Mehmood, Faisal
    Ahmad, Shabir
    Whangbo, Taeg Keun
    MATHEMATICS, 2023, 11 (06)
  • [49] Training deep neural networks with discrete state transition
    Li, Guoqi
    Deng, Lei
    Tian, Lei
    Cui, Haotian
    Han, Wentao
    Pei, Jing
    Shi, Luping
    NEUROCOMPUTING, 2018, 272 : 154 - 162
  • [50] Relating Information Complexity and Training in Deep Neural Networks
    Gain, Alex
    Siegelmann, Hava
    MICRO- AND NANOTECHNOLOGY SENSORS, SYSTEMS, AND APPLICATIONS XI, 2019, 10982