Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method

被引:3
|
作者
Tsuji, Yohei [1 ]
Osawa, Kazuki [1 ]
Ueno, Yuichiro [1 ]
Naruse, Akira [2 ]
Yokota, Rio [3 ]
Matsuoka, Satoshi [4 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] NVIDIA, Tokyo, Japan
[3] AIST, AIST Tokyo Tech RWBC OIL, Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan
[4] Tokyo Inst Technol, RIKEN, Ctr Computat Sci, Kobe, Hyogo, Japan
来源
PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP 2019) | 2019年
关键词
deep learning; second-order optimization; neural networks;
D O I
10.1145/3339186.3339202
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Faster training of deep neural networks is desired to speed up the research and development cycle in deep learning. Distributed deep learning and second-order optimization methods are two different techniques to accelerate the training of deep neural networks. In the previous work, researchers show that an approximated second-order optimization method, called K-FAC, can mitigate each other drawbacks of the two techniques. However, there was no detailed discussion on the performance, which is critical for the usage in practice. In this work, we propose several performance optimization techniques to reduce the overheads of K-FAC and to accelerate the overall training. Applying all performance optimizations, we are able to speed up the training 1.64 times per iteration compared to a baseline. Additional to the performance optimizations, we construct a simple performance model to predict model training performance to help the users to determine whether distributed K-FAC is appropriate or not for their training in terms of wall-time.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] An Accelerated Second-Order Method for Distributed Stochastic Optimization
    Agafonov, Artem
    Dvurechensky, Pavel
    Scutari, Gesualdo
    Gasnikov, Alexander
    Kamzolov, Dmitry
    Lukashevich, Aleksandr
    Daneshmand, Amir
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2407 - 2413
  • [2] A Stochastic Second-Order Proximal Method for Distributed Optimization
    Qiu, Chenyang
    Zhu, Shanying
    Ou, Zichong
    Lu, Jie
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 1405 - 1410
  • [3] INVERSE RELIABILITY ANALYSIS FOR APPROXIMATED SECOND-ORDER RELIABILITY METHOD USING HESSIAN UPDATE
    Lim, Jongmin
    Lee, Byungchai
    Lee, Ikjin
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2014, VOL 2B, 2014,
  • [4] Second-order Derivative Optimization Methods in Deep Learning Neural Networks
    Lim, Si Yong
    Lim, King Hann
    2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 470 - 475
  • [5] A DEEP LEARNING GALERKIN METHOD FOR THE SECOND-ORDER LINEAR ELLIPTIC EQUATIONS
    Li, Jian
    Zhang, Wen
    Yue, Jing
    INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, 2021, 18 (04) : 427 - 441
  • [6] Distributed Optimization Control for the System with Second-Order Dynamic
    Wang, Yueqing
    Zhang, Hao
    Li, Zhi
    MATHEMATICS, 2024, 12 (21)
  • [7] Linearly Convergent Second-Order Distributed Optimization Algorithms
    Qu, Zhihai
    Li, Xiuxian
    Li, Li
    Hong, Yiguang
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (08) : 5431 - 5438
  • [8] Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis
    Lin, Dachao
    Han, Yuze
    Ye, Haishan
    Zhang, Zhihua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] A Second-Order Projected Primal-Dual Dynamical System for Distributed Optimization and Learning
    Wang, Xiaoxuan
    Yang, Shaofu
    Guo, Zhenyuan
    Huang, Tingwen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6568 - 6577
  • [10] A Decentralized Second-Order Method for Dynamic Optimization
    Mokhtari, Aryan
    Shi, Wei
    Ling, Qing
    Ribeiro, Alejandro
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 6036 - 6043