Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning

被引：0

作者：

Zhang, Lin ^{[1
]}

Shi, Shaohuai ^{[2
]}

Wang, Wei ^{[1
]}

Li, Bo ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Guangdong, Peoples R China

来源：

IEEE TRANSACTIONS ON CLOUD COMPUTING | 2023年 / 11卷 / 03期

关键词：

Training; Computational modeling; Clustering algorithms; Graphics processing units; Memory management; Deep learning; Convergence; Distributed deep learning; K-FAC; performance optimization; second-order; NATURAL GRADIENT;

D O I：

10.1109/TCC.2022.3205918

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The second-order optimization methods, notably the D-KFAC (Distributed Kronecker Factored Approximate Curvature) algorithms, have gained traction on accelerating deep neural network (DNN) training on GPU clusters. However, existing D-KFAC algorithms require to compute and communicate a large volume of second-order information, i.e., Kronecker factors (KFs), before preconditioning gradients, resulting in large computation and communication overheads as well as a high memory footprint. In this article, we propose DP-KFAC, a novel distributed preconditioning scheme that distributes the KF constructing tasks at different DNN layers to different workers. DP-KFAC not only retains the convergence property of the existing D-KFAC algorithms but also enables three benefits: reduced computation overhead in constructing KFs, no communication of KFs, and low memory footprint. Extensive experiments on a 64-GPU cluster show that DP-KFAC reduces the computation overhead by 1.55x-1.65x, the communication cost by 2.79x-3.15x, and the memory footprint by 1.14x-1.47x in each second-order update compared to the state-of-the-art D-KFAC methods.

引用

页码：2365 / 2378

页数：14

共 50 条

[21] Distributed Neural Networks Training for Robotic Manipulation With Consensus Algorithm
Liu, Wenxing
Niu, Hanlin
Jang, Inmo
Herrmann, Guido
Carrasco, Joaquin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2732 - 2746
[22] Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs
You, Yang
Zhang, Zhao
Hsieh, Cho-Jui
Demmel, James
Keutzer, Kurt
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (11) : 2449 - 2462
[23] Deep distributed convolutional neural networks: Universality
Zhou, Ding-Xuan
ANALYSIS AND APPLICATIONS, 2018, 16 (06) : 895 - 919
[24] Hierarchical Training of Deep Neural Networks Using Early Exiting
Sepehri, Yamin
Pad, Pedram
Yuzuguler, Ahmet Caner
Frossard, Pascal
Dunbar, L. Andrea
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
[25] Training Deep Photonic Convolutional Neural Networks With Sinusoidal Activations
Passalis, Nikolaos
Mourgias-Alexandris, George
Tsakyridis, Apostolos
Pleros, Nikos
Tefas, Anastasios
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (03): : 384 - 393
[26] GDLL: A Scalable and Share Nothing Architecture Based Distributed Graph Neural Networks Framework
Duong Thi Thu Van
Khan, Muhammad Numan
Afridi, Tariq Habib
Ullah, Irfan
Alam, Aftab
Lee, Young-Koo
IEEE ACCESS, 2022, 10 : 21684 - 21700
[27] BK.Synapse: A scalable distributed training framework for deep learning
Dinh Viet Sang
Phan Ngoc Lan
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, : 43 - 48
[28] Deployment Service for Scalable Distributed Deep Learning Training on Multiple Clouds
Jorge, Javier
Molto, German
Segrelles, Damian
Fontes, Joao Pedro
Guevara, Miguel Angel
CLOSER: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2021, : 135 - 142
[29] Distributed Deep Neural Network Training on Edge Devices
Benditkis, Daniel
Keren, Aviv
Mor-Yosef, Liron
Avidor, Tomer
Shoham, Neta
Tal-Israel, Nadav
SEC'19: PROCEEDINGS OF THE 4TH ACM/IEEE SYMPOSIUM ON EDGE COMPUTING, 2019, : 304 - 306
[30] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
Lim, Eun-Ji
Ahn, Shin-Young
Park, Yoo-Mi
Choi, Wan
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242

← 1 2 3 4 5 →