Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引：0

作者：

Zhang, Lin ^{[1
]}

Shi, Shaohuai ^{[2
]}

Wang, Wei ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2023年 / 11卷 / 03期

关键词：

Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;

D O I：

10.1109/TCC.2022.3205918

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.

引用

页码：2365 / 2378

页数：14

共 50 条

[41] Improved Highway Network Block for Training Very Deep Neural Networks
Oyedotun, Oyebade K.
Shabayek, Abd El Rahman
Aouada, Djamila
Ottersten, Bjorn
IEEE ACCESS, 2020, 8 (08): : 176758 - 176773
[42] Training Robust Deep Neural Networks via Adversarial Noise Propagation
Liu, Aishan
Liu, Xianglong
Yu, Hang
Zhang, Chongzhi
Liu, Qiang
Tao, Dacheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5769 - 5781
[43] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
Fadaeddini, Amin
Majidi, Babak
Eshghi, Mohammad
JOURNAL OF SUPERCOMPUTING, 2020, 76 (12) : 10354 - 10368
[44] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
Amin Fadaeddini
Babak Majidi
Mohammad Eshghi
The Journal of Supercomputing, 2020, 76 : 10354 - 10368
[45] Sequence-discriminative training of deep neural networks
Vesely, Karel
Ghoshal, Arnab
Burgett, Lukas
Povey, Daniel
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2344 - 2348
[46] A survey on parallel training algorithms for deep neural networks
Yook, Dongsuk
Lee, Hyowon
Yoo, In-Chul
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (06): : 505 - 514
[47] Fluorescence microscopy datasets for training deep neural networks
Hagen, Guy M.
Bendesky, Justin
Machado, Rosa
Tram-Anh Nguyen
Kumar, Tanmay
Ventura, Jonathan
GIGASCIENCE, 2021, 10 (05):
[48] An Efficient Optimization Technique for Training Deep Neural Networks
Mehmood, Faisal
Ahmad, Shabir
Whangbo, Taeg Keun
MATHEMATICS, 2023, 11 (06)
[49] Training deep neural networks with discrete state transition
Li, Guoqi
Deng, Lei
Tian, Lei
Cui, Haotian
Han, Wentao
Pei, Jing
Shi, Luping
NEUROCOMPUTING, 2018, 272 : 154 - 162
[50] Relating Information Complexity and Training in Deep Neural Networks
Gain, Alex
Siegelmann, Hava
MICRO- AND NANOTECHNOLOGY SENSORS, SYSTEMS, AND APPLICATIONS XI, 2019, 10982

← 1 2 3 4 5 →