Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference

被引:1
|
作者
Pratap, Tej G. V. S. L. [1 ]
Kumar, Raja [1 ]
Pradeep, N. S. [1 ]
机构
[1] Samsung Res Inst, On Device AI, Bangalore, Karnataka, India
关键词
data free quantization; quantization; DNN inference; synthetic data; model compression;
D O I
10.1109/IJCNN52387.2021.9533724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing quantization aware training methods and post-training quantization methods attempt to compensate the quantization loss by leveraging on training data. Hence, these methods are not effective for privacy constraint applications as they are tightly coupled with training data. In contrast, this paper proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset, hereafter referred to as 'Retro-Synthesis Data', from the FP32 model layer statistics and further using it for quantization. This approach outperformed state-of-the-art methods including, but not limited to, ZeroQ and DFQ on models with and without Batch-Normalization layers for 8, 6, and 4 bit precisions on ImageNet and CIFAR-10 datasets. We also introduced two futuristic variants of posttraining quantization methods namely 'Hybrid Quantization' and 'Non-Uniform Quantization'. The Hybrid Quantization scheme determines the sensitivity of each layer for per-tensor & perchannel quantization, and thereby generates hybrid quantized models that are '10 to 20%' more efficient in inference time while achieving the same or better accuracy as compared to perchannel quantization scheme. Also, this method outperformed FP32 accuracy when applied for ResNet-18, and ResNet-50 models on the ImageNet dataset. In the proposed Non-Uniform Quantization scheme, the weights are grouped into different clusters and these clusters are assigned with a varied number of quantization steps depending on the number of weights and their ranges in the respective cluster. This method resulted in '1%' accuracy improvement against state-of-the-art methods on the ImageNet dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Sonar Data Compression using Non-Uniform Quantization and Noise Shaping
    Wong, Lok S.
    Allen, Gregory E.
    Evans, Brain L.
    CONFERENCE RECORD OF THE 2014 FORTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2014, : 1895 - 1899
  • [2] EZW coding using non-uniform quantization
    Yin, CY
    Derin, H
    WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 884 - 894
  • [3] Uniform and non-uniform quantization of Gaussian processes
    Seleznjev, Oleg
    Shykula, Mykola
    MATHEMATICAL COMMUNICATIONS, 2012, 17 (02) : 447 - 460
  • [4] Non-uniform Quantization in Breaking HUGO
    Chen, Licong
    Shi, Yun Q.
    Sutthiwan, Patchara
    Niu, Xinxin
    DIGITAL-FORENSICS AND WATERMARKING, IWDW 2013, 2014, 8389 : 48 - 62
  • [5] Adder Efficient Multiplierless Non-uniform Filterbank Design using Hybrid Algorithm
    Sharma, I.
    Agrawal, N.
    Kumar, A.
    Balyan, L. K.
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 398 - 404
  • [6] Efficient declustering of non-uniform multidimensional data using shifted Hilbert curves
    Kim, HC
    Lopez, MA
    Leutenegger, ST
    Li, KJ
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2004, 2973 : 694 - 707
  • [7] Information flow of non-uniform differential quantization
    Agre, D
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 2026 - 2036
  • [8] Effects of Non-Uniform Quantization on ECG acquired using Compressed Sensing
    Craven, Darren
    McGinley, Brian
    Kilmartin, Liam
    Glavin, Martin
    Jones, Edward
    2014 EAI 4TH INTERNATIONAL CONFERENCE ON WIRELESS MOBILE COMMUNICATION AND HEALTHCARE (MOBIHEALTH), 2014, : 79 - 82
  • [9] Bit-Weight Adjustment for Bridging Uniform and Non-Uniform Quantization to Build Efficient Image Classifiers
    Zhou, Xichuan
    Duan, Yunmo
    Ding, Rui
    Wang, Qianchuan
    Wang, Qi
    Qin, Jian
    Liu, Haijun
    ELECTRONICS, 2023, 12 (24)
  • [10] A Non-uniform Quantization Filter Based on Adaptive Quantization Interval in WSNs
    Wen, Chenglin
    Zhu, Chaoyang
    Xu, Daxing
    Quan, Lidi
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, ICCSIP 2016, 2017, 710 : 595 - 605