DynaSlim: Dynamic Slimming for Vision Transformers

被引:0
作者
Shi, Da [1 ]
Gao, Jingsheng [1 ]
Liu, Ting [1 ]
Fu, Yuzhuo [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME | 2023年
基金
中国国家自然科学基金;
关键词
Model compression; vision transformer;
D O I
10.1109/ICME55011.2023.00251
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision transformers (ViTs) have achieved significant performance on various vision tasks. However, high computational and memory costs hinder their edge deployment. Existing compression methods employ static constraints between accuracy and efficiency during sparsification. The static constraints restrict the sparsification efficiency and their initialization relies heavily on human expertise. We propose a dynamic slimming strategy for ViT, DynaSlim, to achieve an adaptive accuracy-efficiency constraint during sparsification. We first equip fine-grained, adjustable sparsity weights, the scaling factor between accuracy and efficiency, for multiple dimensions, including input tokens, Multihead Self-Attention (MSA) and Multilayer Perceptron (MLP). We then employ the heuristic search for these non-differentiable factors and combine the search with regularization-based sparsification to obtain the optimal sparsed model. Finally, we compress and retrain the sparsed model under various budgets to get our resulting submodels. Experiments show that our DynaSlim outperforms previous state-of-the-art methods under different budgets. For example, we reduce both parameters and FLOPs of DeiT-B by 39% while increasing its accuracy by 1.9% on ImageNet-1K. Moreover, we demonstrate the transferability of our compressed models on several downstream datasets.
引用
收藏
页码:1451 / 1456
页数:6
相关论文
共 26 条
[11]  
Hou Z., 2022, 2022 IEEE INT C MULT, P01
[12]  
Jin YB, 2021, COMM COM INF SC, V1516, P151, DOI 10.1007/978-3-030-92307-5_18
[13]   SPViT: Enabling Faster Vision Transformers via Latency-Aware Soft Token Pruning [J].
Kong, Zhenglun ;
Dong, Peiyan ;
Ma, Xiaolong ;
Meng, Xin ;
Niu, Wei ;
Sun, Mengshu ;
Shen, Xuan ;
Yuan, Geng ;
Ren, Bin ;
Tang, Hao ;
Qin, Minghai ;
Wang, Yanzhi .
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 :620-640
[14]  
Krizhevsky A., 2010, Cifar-10 (canadian institute for advanced research), V5, P1
[15]  
Lin Y, 2022, PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, P1173
[16]   Learning Efficient Convolutional Networks through Network Slimming [J].
Liu, Zhuang ;
Li, Jianguo ;
Shen, Zhiqiang ;
Huang, Gao ;
Yan, Shoumeng ;
Zhang, Changshui .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2755-2763
[17]   THE MONTE CARLO METHOD [J].
METROPOLIS, N ;
ULAM, S .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1949, 44 (247) :335-341
[18]  
Pan Bowen, 2021, ADV NEUR IN, V34
[19]  
Rao YM, 2021, 35 C NEURAL INFORM P, V34
[20]  
Touvron H, 2021, PR MACH LEARN RES, V139, P7358