GWQ: Group-Wise Quantization Framework for Neural Networks

被引:0
作者
Yang, Jiaming
Tang, Chenwei [1 ]
Yu, Caiyang
Lv, Jiancheng [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
来源
ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222 | 2023年 / 222卷
基金
美国国家科学基金会;
关键词
Int-Only Quantization; Scale Factor Conversion; Quantization-Aware-Training; Deep Neural Networks; Grouping-Wise Algorithm;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most commonly used quantization techniques for deep neural networks, the int-only quantization methods use scale factor to linearly approximate the weights or activation of each layer. However, when passing activation data between layers, such int-only quantization methods require extra Scale Factor Conversion (SFC) operations, resulting in computational overhead. In this paper, we propose a Group-Wise Quantization framework, called GWQ, to reduce computational consumption during the activation data pass process by allowing multiple layers share one scale factor in SFC operations. Specifically, in the GWQ framework, we propose two algorithms for network layers grouping and model training. For the grouping of network layers, we propose a grouping algorithm based on the similarity of data numerical distribution. Then, the network layers divided into the same group will be quantified using the same common scale factor to reduce the computational consumption. Considering the additional performance loss caused by sharing scale factors among multiple layers, we propose a training algorithm to optimize these shared scale factors and model parameters, by designing a learnable power-of-two scaling parameter for each layer. Extensive experiments demonstrate that the proposed GWQ framework is able to effectively reduce the computational burden during inference, while maintaining model performance with negligible impact.
引用
收藏
页数:16
相关论文
共 36 条
[1]  
Alizadeh M, 2020, Arxiv, DOI arXiv:2002.07520
[2]  
Asim Faaiz, Csq: Centered symmetric quantization for extremely low bit neural networks
[3]   UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks [J].
Baskin, Chaim ;
Liss, Natan ;
Schwartz, Eli ;
Zheltonozhskii, Evgenii ;
Giryes, Raja ;
Bronstein, Alex M. ;
Mendelson, Avi .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2021, 37 (1-4) :1-4
[4]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[5]   Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization [J].
Chikin, Vladimir ;
Antiukh, Mikhail .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :450-459
[6]  
Choi J, 2018, Arxiv, DOI arXiv:1805.06085
[7]  
Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830
[8]  
Gholami Amir, 2021, ARXIV, DOI [10.48550/ARXIV.2103.13630, DOI 10.48550/ARXIV.2103.13630]
[9]   Knowledge Distillation: A Survey [J].
Gou, Jianping ;
Yu, Baosheng ;
Maybank, Stephen J. ;
Tao, Dacheng .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1789-1819
[10]   Improving Low-Precision Network Quantization via Bin Regularization [J].
Han, Tiantian ;
Li, Dong ;
Liu, Ji ;
Tian, Lu ;
Shan, Yi .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :5241-5250