Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference

被引:1
|
作者
Pratap, Tej G. V. S. L. [1 ]
Kumar, Raja [1 ]
Pradeep, N. S. [1 ]
机构
[1] Samsung Res Inst, On Device AI, Bangalore, Karnataka, India
关键词
data free quantization; quantization; DNN inference; synthetic data; model compression;
D O I
10.1109/IJCNN52387.2021.9533724
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing quantization aware training methods and post-training quantization methods attempt to compensate the quantization loss by leveraging on training data. Hence, these methods are not effective for privacy constraint applications as they are tightly coupled with training data. In contrast, this paper proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset, hereafter referred to as 'Retro-Synthesis Data', from the FP32 model layer statistics and further using it for quantization. This approach outperformed state-of-the-art methods including, but not limited to, ZeroQ and DFQ on models with and without Batch-Normalization layers for 8, 6, and 4 bit precisions on ImageNet and CIFAR-10 datasets. We also introduced two futuristic variants of posttraining quantization methods namely 'Hybrid Quantization' and 'Non-Uniform Quantization'. The Hybrid Quantization scheme determines the sensitivity of each layer for per-tensor & perchannel quantization, and thereby generates hybrid quantized models that are '10 to 20%' more efficient in inference time while achieving the same or better accuracy as compared to perchannel quantization scheme. Also, this method outperformed FP32 accuracy when applied for ResNet-18, and ResNet-50 models on the ImageNet dataset. In the proposed Non-Uniform Quantization scheme, the weights are grouped into different clusters and these clusters are assigned with a varied number of quantization steps depending on the number of weights and their ranges in the respective cluster. This method resulted in '1%' accuracy improvement against state-of-the-art methods on the ImageNet dataset.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Characterization and Quantization Research of Non-uniform Base Excitation in Vibration Test
    Hu J.
    Elin Z.
    Xiao S.
    Xu M.
    Fan X.
    Wang D.
    Shi X.
    Zhendong Ceshi Yu Zhenduan/Journal of Vibration, Measurement and Diagnosis, 2022, 42 (03): : 524 - 529
  • [42] Evaluation of Spike Sorting and Compression for Digitally Reconfigurable Non-Uniform Quantization
    Pagin, Matteo
    Becker, Joachim
    Ortmanns, Maurits
    2017 IEEE 15TH INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS), 2017, : 177 - 180
  • [43] APPLICATION OF NON-UNIFORM QUANTIZATION TO CLOSED-LOOP DIGITAL SYSTEMS
    EDWARDS, R
    DURKIN, J
    JOURNAL OF PHYSICS E-SCIENTIFIC INSTRUMENTS, 1969, 2 (04): : 321 - &
  • [44] A novel wavelet image coding based on non-uniform scalar quantization
    Wang, GY
    Wang, WT
    COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 2, PROCEEDINGS, 2005, 3802 : 893 - 898
  • [45] Evaluation of non-uniform groundwater level data using spatiotemporal modeling
    Kazemi, Hamideh
    Sarukkalige, Ranjan
    Shao, Quanxi
    GROUNDWATER FOR SUSTAINABLE DEVELOPMENT, 2021, 15 (15)
  • [46] Is a non-uniform system of creatures more efficient than a uniform one?
    Ediger, Patrick
    Hoffmann, Rolf
    Halbach, Mathias
    2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2375 - 2382
  • [47] Anonymous Spatial Query on Non-Uniform Data
    Wang, Shyue-Liang
    Chen, Chung-Yi
    Ting, I-Hsien
    Hong, Tzung-Pei
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (04) : 44 - 61
  • [48] On Reconstruction from Non-uniform Spectral Data
    Adityavikram Viswanathan
    Anne Gelb
    Douglas Cochran
    Rosemary Renaut
    Journal of Scientific Computing, 2010, 45 : 487 - 513
  • [49] Fast and efficient FDTD analysis using non-uniform mesh for small antenna
    Jiang, HL
    Arai, H
    IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM - ANTENNAS: GATEWAYS TO THE GLOBAL NETWORK, VOLS 1-4, 1998, : 1242 - 1245
  • [50] Efficient dynamic occupancy grid mapping using non-uniform cell representation
    Buerkle, Cornelius
    Oboril, Fabian
    Jarquin, Julio
    Scholl, Kay-Ulrich
    2020 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2020, : 1623 - 1628