Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引：0

作者：

Zhang, Lin ^{[1
]}

Shi, Shaohuai ^{[2
]}

Wang, Wei ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2023年 / 11卷 / 03期

关键词：

Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;

D O I：

10.1109/TCC.2022.3205918

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.

引用

页码：2365 / 2378

页数：14

共 50 条

[1] Deep Neural Network Training With Distributed K-FAC
Pauloski, J. Gregory
Huang, Lei
Xu, Weijia
Chard, Kyle
Foster, Ian T.
Zhang, Zhao
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3616 - 3627
[2] Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
Shi, Shaohuai
Zhang, Lin
Li, Bo
2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 550 - 560
[3] SoftMemoryBox II: A Scalable, Shared Memory Buffer Framework for Accelerating Distributed Training of Large-Scale Deep Neural Networks
Ahn, Shinyoung
Lim, Eunji
IEEE ACCESS, 2020, 8 : 207097 - 207111
[4] Defenses Against Byzantine Attacks in Distributed Deep Neural Networks
Xia, Qi
Tao, Zeyi
Li, Qun
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (03): : 2025 - 2035
[5] Latent Weight Quantization for Integerized Training of Deep Neural Networks
Fei, Wen
Dai, Wenrui
Zhang, Liang
Zhang, Luoming
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2816 - 2832
[6] DistGNN: Scalable Distributed Training for Large -Scale Graph Neural Networks
Md, Vasimuddin
Misra, Sanchit
Ma, Guixiang
Mohanty, Ramanarayan
Georganas, Evangelos
Heinecke, Alexander
Kalamkar, Dhiraj
Ahmed, Nesreen K.
Avancha, Sasikanth
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[7] Approximate Fisher Information Matrix to Characterize the Training of Deep Neural Networks
Liao, Zhibin
Drummond, Tom
Reid, Ian
Carneiro, Gustavo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 15 - 26
[8] Scalable bio-inspired training of Deep Neural Networks with FastHebb
Lagani, Gabriele
Falchi, Fabrizio
Gennaro, Claudio
Fassold, Hannes
Amato, Giuseppe
NEUROCOMPUTING, 2024, 595
[9] A Hitchhiker's Guide On Distributed Training Of Deep Neural Networks
Chahal, Karanbir Singh
Grover, Manraj Singh
Dey, Kuntal
Shah, Rajiv Ratn
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 137 (137) : 65 - 76
[10] An In-Depth Analysis of Distributed Training of Deep Neural Networks
Ko, Yunyong
Choi, Kibong
Seo, Jiwon
Kim, Sang-Wook
2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2021, : 994 - 1003

← 1 2 3 4 5 →