ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引：1

作者：

Wang, Jiahao ^{[1
]}

Cao, Mingdeng ^{[1
]}

Shi, Shuwei ^{[1
]}

Wu, Baoyuan ^{[2
]}

Yang, Yujiu ^{[1
]}

机构：

[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

基金：

中国国家自然科学基金;

关键词：

Transformer; data-free; distillation;

D O I：

10.1109/ICASSP43922.2022.9747484

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.

引用

页码：2220 / 2224

页数：5

共 50 条

[21] Knowledge Distillation for Streaming Transformer-Transducer
Kojima, Atsushi
INTERSPEECH 2021, 2021, : 2841 - 2845
[22] Sleep-CMKD: Self-Attention CNN/Transformer Cross-Model Knowledge Distillation for Automatic Sleep Staging
Kim, Hyounggyu
Kim, Moogyeong
Chung, Wonzoo
2023 11TH INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE, BCI, 2023,
[23] Video Summarization With Spatiotemporal Vision Transformer
Hsu, Tzu-Chun
Liao, Yi-Sheng
Huang, Chun-Rong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3013 - 3026
[24] Self-slimmed Vision Transformer
Zong, Zhuofan
Li, Kunchang
Song, Guanglu
Wang, Yali
Qiao, Yu
Leng, Biao
Liu, Yu
COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 432 - 448
[25] SPVT: Spiked Pyramid Vision Transformer
Guo, Yazhuo
Qin, Yuhan
Chen, Song
Kang, Yi
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 110 - 113
[26] Transformer-coupled NMR probe
Utsuzawa, Shin
Mandal, Soumyajit
Song, Yi-Qiao
JOURNAL OF MAGNETIC RESONANCE, 2012, 216 : 128 - 133
[27] Representation Learning Based on Vision Transformer
Ran, Ruisheng
Gao, Tianyu
Hu, Qianwei
Zhang, Wenfeng
Peng, Shunshun
Fang, Bin
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (07)
[28] AttentionViz: A Global View of Transformer Attention
Yeh, Catherine
Chen, Yida
Wu, Aoyu
Chen, Cynthia
Viegas, Fernanda
Wattenberg, Martin
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (01) : 262 - 272
[29] Depth-Vision-Decoupled Transformer With Cascaded Group Convolutional Attention for Monocular 3-D Object Detection
Xu, Yan
Wang, Haoyuan
Ji, Zhong
Zhang, Qiyuan
Jia, Qian
Li, Xuening
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[30] Attention Head Interactive Dual Attention Transformer for Hyperspectral Image Classification
Shi, Cuiping
Yue, Shuheng
Wang, Liguo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1

← 1 2 3 4 5 →