Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method

被引：3

作者：

Tsuji, Yohei ^{[1
]}

Osawa, Kazuki ^{[1
]}

Ueno, Yuichiro ^{[1
]}

Naruse, Akira ^{[2
]}

Yokota, Rio ^{[3
]}

Matsuoka, Satoshi ^{[4
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] NVIDIA, Tokyo, Japan

[3] AIST, AIST Tokyo Tech RWBC OIL, Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan

[4] Tokyo Inst Technol, RIKEN, Ctr Computat Sci, Kobe, Hyogo, Japan

来源：

PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) | 2019年

关键词：

deep learning; second-order optimization; neural networks;

D O I：

10.1145/3339186.3339202

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Faster training of deep neural networks is desired to speed up the research and development cycle in deep learning. Distributed deep learning and second-order optimization methods are two different techniques to accelerate the training of deep neural networks. In the previous work, researchers show that an approximated second-order optimization method, called K-FAC, can mitigate each other drawbacks of the two techniques. However, there was no detailed discussion on the performance, which is critical for the usage in practice. In this work, we propose several performance optimization techniques to reduce the overheads of K-FAC and to accelerate the overall training. Applying all performance optimizations, we are able to speed up the training 1.64 times per iteration compared to a baseline. Additional to the performance optimizations, we construct a simple performance model to predict model training performance to help the users to determine whether distributed K-FAC is appropriate or not for their training in terms of wall-time.

引用

页数：8

共 50 条

[1] An Accelerated Second-Order Method for Distributed Stochastic Optimization
Agafonov, Artem
Dvurechensky, Pavel
Scutari, Gesualdo
Gasnikov, Alexander
Kamzolov, Dmitry
Lukashevich, Aleksandr
Daneshmand, Amir
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2407 - 2413
[2] A Stochastic Second-Order Proximal Method for Distributed Optimization
Qiu, Chenyang
Zhu, Shanying
Ou, Zichong
Lu, Jie
IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 1405 - 1410
[3] INVERSE RELIABILITY ANALYSIS FOR APPROXIMATED SECOND-ORDER RELIABILITY METHOD USING HESSIAN UPDATE
Lim, Jongmin
Lee, Byungchai
Lee, Ikjin
PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2014, VOL 2B, 2014,
[4] Second-order Derivative Optimization Methods in Deep Learning Neural Networks
Lim, Si Yong
Lim, King Hann
2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 470 - 475
[5] A DEEP LEARNING GALERKIN METHOD FOR THE SECOND-ORDER LINEAR ELLIPTIC EQUATIONS
Li, Jian
Zhang, Wen
Yue, Jing
INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, 2021, 18 (04) : 427 - 441
[6] Distributed Optimization Control for the System with Second-Order Dynamic
Wang, Yueqing
Zhang, Hao
Li, Zhi
MATHEMATICS, 2024, 12 (21)
[7] Linearly Convergent Second-Order Distributed Optimization Algorithms
Qu, Zhihai
Li, Xiuxian
Li, Li
Hong, Yiguang
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (08) : 5431 - 5438
[8] Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis
Lin, Dachao
Han, Yuze
Ye, Haishan
Zhang, Zhihua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] A Second-Order Projected Primal-Dual Dynamical System for Distributed Optimization and Learning
Wang, Xiaoxuan
Yang, Shaofu
Guo, Zhenyuan
Huang, Tingwen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6568 - 6577
[10] A Decentralized Second-Order Method for Dynamic Optimization
Mokhtari, Aryan
Shi, Wei
Ling, Qing
Ribeiro, Alejandro
2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 6036 - 6043

← 1 2 3 4 5 →