Post-training quantization for re-parameterization via coarse & fine weight splitting

被引:7
作者
Yang, Dawei [1 ]
He, Ning [2 ,3 ]
Hu, Xing [2 ]
Yuan, Zhihang [2 ]
Yu, Jiangyong [2 ]
Xu, Chen [2 ]
Jiang, Zhe [3 ,4 ]
机构
[1] Nanjing Inst Technol, Sch Comp Engn, Nanjing, Peoples R China
[2] Houmo AI, Nanjing, Peoples R China
[3] Southeast Univ, Nanjing, Peoples R China
[4] Univ Cambridge, Cambridge, England
基金
瑞典研究理事会;
关键词
PTQ; CNN; Quantization;
D O I
10.1016/j.sysarc.2024.103065
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more efficient and scalable AI deployments. Recently, Re -parameterization has emerged as a promising technique to enhance model performance while simultaneously alleviating the computational burden in various computer vision tasks. However, the accuracy drops significantly when applying quantization on the re -parameterized networks. We identify that the primary challenge arises from the large variation in weight distribution across the original branches. To address this issue, we propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight, and develop an improved KL metric to determine optimal quantization scales for activation. To the best of our knowledge, our approach is the first work that enables post -training quantization applicable on re -parameterized networks. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss. The code is in https: //github.com/NeonHo/Coarse-Fine-Weight-Split.git
引用
收藏
页数:9
相关论文
共 30 条
[1]  
Chmiel B., 2020, P 34 C NEUR INF PROC, V33, P5308
[2]  
Chu XX, 2023, Arxiv, DOI arXiv:2212.01593
[3]  
Davoodi Pooya, 2019, GPU TECHN C
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]  
Ding XH, 2022, Arxiv, DOI arXiv:2205.15242
[6]   Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11953-11965
[7]   RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [J].
Ding, Xiaohan ;
Chen, Honghao ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :568-577
[8]   ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting [J].
Ding, Xiaohan ;
Hao, Tianxiang ;
Tan, Jianchao ;
Liu, Ji ;
Han, Jungong ;
Guo, Yuchen ;
Ding, Guiguang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4490-4500
[9]   RepVGG: Making VGG-style ConvNets Great Again [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Ma, Ningning ;
Han, Jungong ;
Ding, Guiguang ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737
[10]  
Gholami A, 2021, A Survey of Quantization Methods for Efficient Neural Network Inference