Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment

被引:5
作者
Lee, Sunwoo [1 ,2 ]
He, Chaoyang [1 ]
Avestimehr, Salman [1 ]
机构
[1] Univ Southern Calif, 3740 McClintock Ave, Los Angeles, CA 90007 USA
[2] Inha Univ, 100 Inha Ro, Incheon 22212, South Korea
关键词
Deep learning; Large -batch training; Hessian information; Learning rate adjustment;
D O I
10.1016/j.neunet.2022.11.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider synchronous data-parallel neural network training with a fixed large batch size. While the large batch size provides a high degree of parallelism, it degrades the generalization performance due to the low gradient noise scale. We propose a general learning rate adjustment framework and three critical heuristics that tackle the poor generalization issue. The key idea is to adjust the learning rate based on geometric information of loss landscape and encourage the model to converge into a flat minimum that is known to better generalize to the unknown data. Our empirical study demonstrates that the Hessian-aware learning rate schedule remarkably improves the generalization performance in large-batch training. For CIFAR-10 classification with ResNet20, our method achieves 92.31% accuracy using 16,384 batch size, which is close to 92.83% achieved using 128 batch size, at a negligible extra computational cost. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 38 条
  • [1] Akiba T, 2017, Arxiv, DOI arXiv:1711.04325
  • [2] Balles L, 2017, Arxiv, DOI arXiv:1612.05086
  • [3] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [4] Devarakonda A., 2017, arXiv
  • [5] Foret P, 2021, Arxiv, DOI [arXiv:2010.01412, 10.48550/arXiv.2010.01412]
  • [6] Ge R., 2019, ADV NEURAL INFORM PR, P14977
  • [7] Goyal P, 2018, Arxiv, DOI arXiv:1706.02677
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Bag of Tricks for Image Classification with Convolutional Neural Networks
    He, Tong
    Zhang, Zhi
    Zhang, Hang
    Zhang, Zhongyue
    Xie, Junyuan
    Li, Mu
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 558 - 567
  • [10] Hoffer E, 2017, ADV NEUR IN, V30