Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引:0
|
作者
Zhang, Lin [1 ]
Shi, Shaohuai [2 ]
Wang, Wei [1 ]
Li, Bo [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China
关键词
Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;
D O I
10.1109/TCC.2022.3205918
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.
引用
收藏
页码:2365 / 2378
页数:14
相关论文
共 50 条
  • [31] High Performance Training of Deep Neural Networks Using Pipelined Hardware Acceleration and Distributed Memory
    Mehta, Ragav
    Huang, Yuyang
    Cheng, Mingxi
    Bagga, Shrey
    Mathur, Nishant
    Li, Ji
    Draper, Jeffrey
    Nazarian, Shahin
    2018 19TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2018, : 383 - 388
  • [32] Distributed B-SDLM: Accelerating the Training Convergence of Deep Neural Networks Through Parallelism
    Liew, Shan Sung
    Khalil-Hani, Mohamed
    Bakhteri, Rabia
    PRICAI 2016: TRENDS IN ARTIFICIAL INTELLIGENCE, 2016, 9810 : 243 - 250
  • [33] An efficient bandwidth-adaptive gradient compression algorithm for distributed training of deep neural networks
    Wang, Zeqin
    Duan, Qingyang
    Xu, Yuedong
    Zhang, Liang
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 150
  • [34] The Impact of Architecture on the Deep Neural Networks Training
    Rozycki, Pawel
    Kolbusz, Janusz
    Malinowski, Aleksander
    Wilamowski, Bogdan
    2019 12TH INTERNATIONAL CONFERENCE ON HUMAN SYSTEM INTERACTION (HSI), 2019, : 41 - 46
  • [35] An Optimization Strategy for Deep Neural Networks Training
    Wu, Tingting
    Zeng, Peng
    Song, Chunhe
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 596 - 603
  • [36] DANTE: Deep alternations for training neural networks
    Sinha, Vaibhav B.
    Kudugunta, Sneha
    Sankar, Adepu Ravi
    Chavali, Surya Teja
    Balasubramanian, Vineeth N.
    NEURAL NETWORKS, 2020, 131 : 127 - 143
  • [37] Heterogeneous gradient computing optimization for scalable deep neural networks
    Moreno-Alvarez, Sergio
    Paoletti, Mercedes E.
    Rico-Gallego, Juan A.
    Haut, Juan M.
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (11) : 13455 - 13469
  • [38] Heterogeneous gradient computing optimization for scalable deep neural networks
    Sergio Moreno-Álvarez
    Mercedes E. Paoletti
    Juan A. Rico-Gallego
    Juan M. Haut
    The Journal of Supercomputing, 2022, 78 : 13455 - 13469
  • [39] Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks
    Iiduka, Hideaki
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13250 - 13261
  • [40] An Efficient Method for Training Deep Learning Networks Distributed
    Wang, Chenxu
    Lu, Yutong
    Chen, Zhiguang
    Li, Junnan
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2444 - 2456