Binary Quantized Network Training With Sharpness-Aware Minimization

被引:3
作者
Liu, Ren [1 ]
Bian, Fengmiao [2 ]
Zhang, Xiaoqun [3 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Math Sci, Shanghai 200240, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Math, Clear Water Bay, Hong Kong, Peoples R China
[3] Shanghai Jiao Tong Univ, Inst Nat Sci, Sch Math Sci, MOE,LSC, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary quantized network; Sharpness-aware minimization; NEURAL-NETWORKS;
D O I
10.1007/s10915-022-02064-7
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The quantized neural network is a common way to improve inference and memory efficiency for deep learning methods. However, it is challenging to solve this optimization problem with good generalization due to its high nonlinearity and nonconvexity. This paper proposes an algorithm to train one-bit quantization neural networks based on sharpness-aware minimization Foret et al. (Sharpness-aware minimization for efficiently improving generalization, 2021) with two types of gradient approximation. The idea is to improve the generalization by finding a local minimum of a flat landscape of both continuous and quantized neural network loss. The convergence theory is partially established under a one-bit quantization setting. Experiments on the CIFAR-10 Krizhevsky (Learning multiple layers of features from tiny images, 2009) and SVHN Netzer et al. (Reading digits in natural images with unsupervised feature learning, 2011) datasets show improvement in generalization with the proposed algorithm compared to other state-of-the-art quantized training methods.
引用
收藏
页数:26
相关论文
共 39 条
  • [1] Ashok A, 2017, Arxiv, DOI arXiv:1709.06030
  • [2] WEIGHT QUANTIZATION IN BOLTZMANN MACHINES
    BALZER, W
    TAKAHASHI, M
    OHTA, J
    KYUMA, K
    [J]. NEURAL NETWORKS, 1991, 4 (03) : 405 - 409
  • [3] Carreira-Perpi¤an MA, 2017, Arxiv, DOI arXiv:1707.04319
  • [4] Courbariaux M, 2015, ADV NEUR IN, V28
  • [5] Fallah A, 2020, PR MACH LEARN RES, V108, P1082
  • [6] FIESLER E, 1990, P SOC PHOTO-OPT INS, V1281, P164, DOI 10.1117/12.20700
  • [7] Finn C, 2017, PR MACH LEARN RES, V70
  • [8] Foret P, 2021, Arxiv, DOI [arXiv:2010.01412, 10.48550/arXiv.2010.01412]
  • [9] He K., 2016, INDIAN J CHEM B
  • [10] Huang YP, 2019, Arxiv, DOI [arXiv:1811.06965, 10.48550/arXiv.1811.06965]