Stochastic Markov gradient descent and training low-bit neural networks

被引:0
|
作者
Ashbrock, Jonathan [1 ]
Powell, Alexander M. [2 ]
机构
[1] MITRE Corp, Mclean, VA 22102 USA
[2] Vanderbilt Univ, Dept Math, Nashville, TN 37240 USA
来源
SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS | 2021年 / 19卷 / 02期
关键词
Neural networks; Quantization; Stochastic gradient descent; Stochastic Markov gradient descent; Low-memory training;
D O I
10.1007/s43670-021-00015-1
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The massive size of modern neural networks has motivated substantial recent interest in neural network quantization, especially low-bit quantization. We introduce Stochastic Markov Gradient Descent (SMGD), a discrete optimization method applicable to training quantized neural networks. The SMGD algorithm is designed for settings where memory is highly constrained during training. We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Exploring the Potential of Low-Bit Training of Convolutional Neural Networks
    Zhong, Kai
    Ning, Xuefei
    Dai, Guohao
    Zhu, Zhenhua
    Zhao, Tianchen
    Zeng, Shulin
    Wang, Yu
    Yang, Huazhong
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5421 - 5434
  • [2] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
    Zhou, Jingcheng
    Wei, Wei
    Zhang, Ruizhi
    Zheng, Zhiming
    MATHEMATICS, 2021, 9 (13)
  • [3] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [4] A TDA-based performance analysis for neural networks with low-bit weights
    Ogio, Yugo
    Tsubone, Naoki
    Minami, Yuki
    Ishikawa, Masato
    ARTIFICIAL LIFE AND ROBOTICS, 2025,
  • [5] HSB-GDM: a Hybrid Stochastic-Binary Circuit for Gradient Descent with Momentum in the Training of Neural Networks
    Li, Han
    Shi, Heng
    Jiang, Honglan
    Liu, Siting
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES, NANOARCH 2022, 2022,
  • [6] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
    Arnulf Jentzen
    Adrian Riekert
    Zeitschrift für angewandte Mathematik und Physik, 2022, 73
  • [7] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
    Jentzen, Arnulf
    Riekert, Adrian
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2022, 73 (05):
  • [8] Learning Bilateral Clipping Parametric Activation for Low-Bit Neural Networks
    Ding, Yunlong
    Chen, Di-Rong
    MATHEMATICS, 2023, 11 (09)
  • [9] Feature Map-Aware Activation Quantization for Low-bit Neural Networks
    Lee, Seungjin
    Kim, Hyun
    2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
  • [10] Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
    Jentzen, Arnulf
    Welti, Timo
    APPLIED MATHEMATICS AND COMPUTATION, 2023, 455