Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TINYSCRIPT

被引：0

作者：

Fu, Fangcheng ^{[1
,2
,3
]}

Hu, Yuzheng ^{[1
,2
]}

He, Yihan ^{[1
,2
]}

Jiang, Jiawei ^{[4
]}

Shao, Yingxia ^{[5
]}

Zhang, Ce ^{[4
]}

Cui, Bin ^{[1
,2
,6
,7
]}

机构：

[1] Peking Univ, Dept Comp Sci & Technol, Beijing, Peoples R China

[2] Peking Univ, Key Lab High Confidence Softwaretechnol MOE, Beijing, Peoples R China

[3] Tencent Inc, Shenzhen, Peoples R China

[4] Swiss Fed Inst Technol, Zurich, Switzerland

[5] BUPT, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China

[6] Peking Univ, Ctr Data Sci, Beijing, Peoples R China

[7] Peking Univ, Natl Engn Lab Big Data Anal & Applicat, Beijing, Peoples R China

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷

基金：

瑞士国家科学基金会; 中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations arc quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TINYSCRIPT, which applies a non-uniform quantization algorithm to both activations and gradients. TINYSCRIPT models the original values by a family of Weibull distributions and searches for "quantization knobs" that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TINYSCRIPT always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.

引用

页数：11

共 6 条

[1] Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TINYSCRIPT
Fu, Fangcheng
Hu, Yuzheng
He, Yihan
Jiang, Jiawei
Shao, Yingxia
Zhang, Ce
Cui, Bin
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[2] Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations
Nam, Woo-Jeoung
Choi, Jaesik
Lee, Seong-Whan
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11604 - 11612
[3] Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
Zhang, Jiong
Lei, Qi
Dhillon, Inderjit S.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[4] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
Guang Shi
Jiangshe Zhang
Huirong Li
Changpeng Wang
Neural Processing Letters, 2019, 50 : 57 - 75
[5] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
Shi, Guang
Zhang, Jiangshe
Li, Huirong
Wang, Changpeng
NEURAL PROCESSING LETTERS, 2019, 50 (01) : 57 - 75
[6] Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
Schulze, Soeren
Leuschner, Johannes
King, Emily J.
SIGNALS, 2021, 2 (04): : 637 - 661

← 1 →