Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TINYSCRIPT

被引:0
|
作者
Fu, Fangcheng [1 ,2 ,3 ]
Hu, Yuzheng [1 ,2 ]
He, Yihan [1 ,2 ]
Jiang, Jiawei [4 ]
Shao, Yingxia [5 ]
Zhang, Ce [4 ]
Cui, Bin [1 ,2 ,6 ,7 ]
机构
[1] Peking Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
[2] Peking Univ, Key Lab High Confidence Softwaretechnol MOE, Beijing, Peoples R China
[3] Tencent Inc, Shenzhen, Peoples R China
[4] Swiss Fed Inst Technol, Zurich, Switzerland
[5] BUPT, Beijing Key Lab Intelligent Telecommun Software &, Beijing, Peoples R China
[6] Peking Univ, Ctr Data Sci, Beijing, Peoples R China
[7] Peking Univ, Natl Engn Lab Big Data Anal & Applicat, Beijing, Peoples R China
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷
基金
瑞士国家科学基金会; 中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations arc quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TINYSCRIPT, which applies a non-uniform quantization algorithm to both activations and gradients. TINYSCRIPT models the original values by a family of Weibull distributions and searches for "quantization knobs" that minimize quantization variance. We also discuss the convergence of the non-uniform quantization algorithm on DNNs with varying depths, shedding light on the number of bits required for convergence. Experiments show that TINYSCRIPT always obtains lower quantization variance, and achieves comparable model qualities against full precision training using 1-2 bits less than the uniform-based counterpart.
引用
收藏
页数:11
相关论文
共 6 条
  • [1] Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TINYSCRIPT
    Fu, Fangcheng
    Hu, Yuzheng
    He, Yihan
    Jiang, Jiawei
    Shao, Yingxia
    Zhang, Ce
    Cui, Bin
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [2] Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations
    Nam, Woo-Jeoung
    Choi, Jaesik
    Lee, Seong-Whan
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 11604 - 11612
  • [3] Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
    Zhang, Jiong
    Lei, Qi
    Dhillon, Inderjit S.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [4] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
    Guang Shi
    Jiangshe Zhang
    Huirong Li
    Changpeng Wang
    Neural Processing Letters, 2019, 50 : 57 - 75
  • [5] Enhance the Performance of Deep Neural Networks via L2 Regularization on the Input of Activations
    Shi, Guang
    Zhang, Jiangshe
    Li, Huirong
    Wang, Changpeng
    NEURAL PROCESSING LETTERS, 2019, 50 (01) : 57 - 75
  • [6] Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients
    Schulze, Soeren
    Leuschner, Johannes
    King, Emily J.
    SIGNALS, 2021, 2 (04): : 637 - 661