RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

被引:16
作者
Li, Zhikai [1 ,2 ]
Xiao, Junrui [1 ,2 ]
Yang, Lianwei [1 ,2 ]
Gu, Qingyi [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01580
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in lowbit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scalereparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe interchannel variation and post-Softmax activations with powerlaw features, and initially apply channel-wise quantization and log 2 quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level. Code is available at https://github.com/zkkli/RepQ-ViT.
引用
收藏
页码:17181 / 17190
页数:10
相关论文
共 38 条
  • [11] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [12] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
  • [13] Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search
    Hou, Zejiang
    Kung, Sun-Yuan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3668 - 3677
  • [14] Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
    Jacob, Benoit
    Kligys, Skirmantas
    Chen, Bo
    Zhu, Menglong
    Tang, Matthew
    Howard, Andrew
    Adam, Hartwig
    Kalenichenko, Dmitry
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2704 - 2713
  • [15] Krishnamoorthi Raghuraman, 2018, ARXIV180608342
  • [16] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [17] Lee EH, 2017, INT CONF ACOUST SPEE, P5900, DOI 10.1109/ICASSP.2017.7953288
  • [18] Fully Quantized Network for Object Detection
    Li, Rundong
    Wang, Yan
    Liang, Feng
    Qin, Hongwei
    Yan, Junjie
    Fan, Rui
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2805 - 2814
  • [19] Using Composite Phenotypes to Reveal Hidden Physiological Heterogeneity in High-Altitude Acclimatization in a Chinese Han Longitudinal Cohort
    Li, Yi
    Ma, Yanyun
    Wang, Kun
    Zhang, Menghan
    Wang, Yi
    Liu, Xiaoyu
    Hao, Meng
    Yin, Xianhong
    Liang, Meng
    Zhang, Hui
    Wang, Xiaofeng
    Chen, Xingdong
    Zhang, Yao
    Duan, Wenyuan
    Kang, Longli
    Qiao, Bin
    Wang, Jiucun
    Jin, Li
    [J]. PHENOMICS, 2021, 1 (01): : 3 - 14
  • [20] Li Z., 2022, ARXIV220701405