DynaSlim: Dynamic Slimming for Vision Transformers

被引：0

作者：

Shi, Da ^{[1
]}

Gao, Jingsheng ^{[1
]}

Liu, Ting ^{[1
]}

Fu, Yuzhuo ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年

基金：

中国国家自然科学基金;

关键词：

Model compression; vision transformer;

D O I：

10.1109/ICME55011.2023.00251

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vision transformers (ViTs) have achieved significant performance on various vision tasks. However, high computational and memory costs hinder their edge deployment. Existing compression methods employ static constraints between accuracy and efficiency during sparsification. The static constraints restrict the sparsification efficiency and their initialization relies heavily on human expertise. We propose a dynamic slimming strategy for ViT, DynaSlim, to achieve an adaptive accuracy-efficiency constraint during sparsification. We first equip fine-grained, adjustable sparsity weights, the scaling factor between accuracy and efficiency, for multiple dimensions, including input tokens, Multihead Self-Attention (MSA) and Multilayer Perceptron (MLP). We then employ the heuristic search for these non-differentiable factors and combine the search with regularization-based sparsification to obtain the optimal sparsed model. Finally, we compress and retrain the sparsed model under various budgets to get our resulting submodels. Experiments show that our DynaSlim outperforms previous state-of-the-art methods under different budgets. For example, we reduce both parameters and FLOPs of DeiT-B by 39% while increasing its accuracy by 1.9% on ImageNet-1K. Moreover, we demonstrate the transferability of our compressed models on several downstream datasets.

引用

页码：1451 / 1456

页数：6

共 26 条

[1] Bui Kevin, 2020, Advances in Visual Computing. 15th International Symposium, ISVC 2020. Proceedings. Lecture Notes in Computer Science (LNCS 12509), P39, DOI 10.1007/978-3-030-64556-4_4
[2] Changpinyo S, 2017, Arxiv, DOI arXiv:1702.06257
[3] Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space
Chavan, Arnav
Shen, Zhiqiang
Liu, Zhuang
Liu, Zechun
Cheng, Kwang-Ting
Xing, Eric
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4921 - 4931
[4] Transformer Interpretability Beyond Attention Visualization
Chefer, Hila
Gur, Shir
Wolf, Lior
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 782 - 791
[5] GLiT: Neural Architecture Search for Global and Local Image Transformer
Chen, Boyu
Li, Peixia
Li, Chuming
Li, Baopu
Bai, Lei
Lin, Chen
Sun, Ming
Yan, Junjie
Ouyang, Wanli
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12 - 21
[6] AutoFormer: Searching Transformers for Visual Recognition
Chen, Minghao
Peng, Houwen
Fu, Jianlong
Ling, Haibin
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12250 - 12260
[7] DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Chen, Xianing
Cao, Qiong
Zhong, Yujie
Zhang, Jing
Gao, Shenghua
Tao, Dacheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12042 - 12052
[8] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[10] Hendrycks D, 2020, Arxiv, DOI arXiv:1606.08415

← 1 2 3 →