PSAQ-ViT V2: Toward Accurate and General Data-Free Quantization for Vision Transformers

被引：9

作者：

Li, Zhikai ^{[1
,2
]}

Chen, Mengjuan ^{[1
]}

Xiao, Junrui ^{[1
,2
]}

Gu, Qingyi ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Data-free quantization; model compression; patch similarity; quantized vision transformers (ViTs);

D O I：

10.1109/TNNLS.2023.3301007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data-free quantization can potentially address data privacy and security concerns in model compression and thus has been widely investigated. Recently, patch similarity aware data-free quantization for vision transformers (PSAQ-ViT) designs a relative value metric, patch similarity, to generate data from pretrained vision transformers (ViTs), achieving the first attempt at data-free quantization for ViTs. In this article, we propose PSAQ-ViT V2, a more accurate and general data-free quantization framework for ViTs, built on top of PSAQ-ViT. More specifically, following the patch similarity metric in PSAQ-ViT, we introduce an adaptive teacher-student strategy, which facilitates the constant cyclic evolution of the generated samples and the quantized model (student) in a competitive and interactive fashion under the supervision of the full-precision (FP) model (teacher), thus significantly improving the accuracy of the quantized model. Moreover, without the auxiliary category guidance, we employ the task- and model-independent prior information, making the general-purpose scheme compatible with a broad range of vision tasks and models. Extensive experiments are conducted on various models on image classification, object detection, and semantic segmentation tasks, and PSAQ-ViT V2, with the naive quantization strategy and without access to real-world data, consistently achieves competitive results, showing potential as a powerful baseline on data-free quantization for ViTs. For instance, with Swin-S as the (backbone) model, 8-bit quantization reaches 82.13 top-1 accuracy on ImageNet, 50.9 box AP and 44.1 mask AP on COCO, and 47.2 mean Intersection over Union (mIoU) on ADE20K. We hope that accurate and general PSAQ-ViT V2 can serve as a potential and practice solution in real-world applications involving sensitive data. Code is released and merged at: https://github.com/zkkli/PSAQ-ViT.

引用

页码：17227 / 17238

页数：12

共 64 条

[1] An Effective Video Transformer With Synchronized Spatiotemporal and Spatial Self-Attention for Action Recognition [J].

Alfasly, Saghir ;

Chui, Charles K. ;

Jiang, Qingtang ;

Lu, Jian ;

Xu, Chen .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) :2496-2509

[2] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[3]

Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432

[4] ZeroQ: A Novel Zero Shot Quantization Framework [J].

Cai, Yaohui ;

Yao, Zhewei ;

Dong, Zhen ;

Gholami, Amir ;

Mahoney, Michael W. ;

Keutzer, Kurt .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175

[5] Cascade R-CNN: Delving into High Quality Object Detection [J].

Cai, Zhaowei ;

Vasconcelos, Nuno .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6154-6162

[6] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[7] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[8]

Chen K, 2019, Arxiv, DOI arXiv:1906.07155

[9]

Chen T, 2020, PR MACH LEARN RES, V119

[10] Transformer Tracking [J].

Chen, Xin ;

Yan, Bin ;

Zhu, Jiawen ;

Wang, Dong ;

Yang, Xiaoyun ;

Lu, Huchuan .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8122-8131

← 1 2 3 4 5 6 7 →