Stochastic Markov gradient descent and training low-bit neural networks

被引：0

作者：

Ashbrock, Jonathan ^{[1
]}

Powell, Alexander M. ^{[2
]}

机构：

[1] MITRE Corp, Mclean, VA 22102 USA

[2] Vanderbilt Univ, Dept Math, Nashville, TN 37240 USA

来源：

SAMPLING THEORY SIGNAL PROCESSING AND DATA ANALYSIS | 2021年 / 19卷 / 02期

关键词：

Neural networks; Quantization; Stochastic gradient descent; Stochastic Markov gradient descent; Low-memory training;

D O I：

10.1007/s43670-021-00015-1

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

The massive size of modern neural networks has motivated substantial recent interest in neural network quantization, especially low-bit quantization. We introduce Stochastic Markov Gradient Descent (SMGD), a discrete optimization method applicable to training quantized neural networks. The SMGD algorithm is designed for settings where memory is highly constrained during training. We provide theoretical guarantees of algorithm performance as well as encouraging numerical results.

引用

页数：23

共 50 条

[1] Exploring the Potential of Low-Bit Training of Convolutional Neural Networks
Zhong, Kai
Ning, Xuefei
Dai, Guohao
Zhu, Zhenhua
Zhao, Tianchen
Zeng, Shulin
Wang, Yu
Yang, Huazhong
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (12) : 5421 - 5434
[2] Damped Newton Stochastic Gradient Descent Method for Neural Networks Training
Zhou, Jingcheng
Wei, Wei
Zhang, Ruizhi
Zheng, Zhiming
MATHEMATICS, 2021, 9 (13)
[3] Accelerating deep neural network training with inconsistent stochastic gradient descent
Wang, Linnan
Yang, Yi
Min, Renqiang
Chakradhar, Srimat
NEURAL NETWORKS, 2017, 93 : 219 - 229
[4] A TDA-based performance analysis for neural networks with low-bit weights
Ogio, Yugo
Tsubone, Naoki
Minami, Yuki
Ishikawa, Masato
ARTIFICIAL LIFE AND ROBOTICS, 2025,
[5] HSB-GDM: a Hybrid Stochastic-Binary Circuit for Gradient Descent with Momentum in the Training of Neural Networks
Li, Han
Shi, Heng
Jiang, Honglan
Liu, Siting
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL SYMPOSIUM ON NANOSCALE ARCHITECTURES, NANOARCH 2022, 2022,
[6] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
Zeitschrift für angewandte Mathematik und Physik, 2022, 73
[7] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Jentzen, Arnulf
Riekert, Adrian
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK, 2022, 73 (05):
[8] Learning Bilateral Clipping Parametric Activation for Low-Bit Neural Networks
Ding, Yunlong
Chen, Di-Rong
MATHEMATICS, 2023, 11 (09)
[9] Feature Map-Aware Activation Quantization for Low-bit Neural Networks
Lee, Seungjin
Kim, Hyun
2021 36TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC), 2021,
[10] Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
Jentzen, Arnulf
Welti, Timo
APPLIED MATHEMATICS AND COMPUTATION, 2023, 455

← 1 2 3 4 5 →