ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引：1

作者：

Wang, Jiahao ^{[1
]}

Cao, Mingdeng ^{[1
]}

Shi, Shuwei ^{[1
]}

Wu, Baoyuan ^{[2
]}

Yang, Yujiu ^{[1
]}

机构：

[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

基金：

中国国家自然科学基金;

关键词：

Transformer; data-free; distillation;

D O I：

10.1109/ICASSP43922.2022.9747484

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.

引用

页码：2220 / 2224

页数：5

共 50 条

[41] Masked Vision-language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
Dehong Gao
Deng-Ping Fan
Christos Sakaridis
Luc Van Gool
Machine Intelligence Research, 2023, 20 : 421 - 434
[42] MaxViT: Multi-axis Vision Transformer
Tu, Zhengzhong
Talebi, Hossein
Zhang, Han
Yang, Feng
Milanfar, Peyman
Bovik, Alan
Li, Yinxiao
COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 : 459 - 479
[43] Towards liver segmentation in the wild via contrastive distillation
Fogarollo, Stefano
Bale, Reto
Harders, Matthias
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (07) : 1143 - 1149
[44] Transformer-Based Distillation Hash Learning for Image Retrieval
Lv, Yuanhai
Wang, Chongyan
Yuan, Wanteng
Qian, Xiaohao
Yang, Wujun
Zhao, Wanqing
ELECTRONICS, 2022, 11 (18)
[45] Experimental studies on absorption heat transformer coupled distillation system
Sekar, S.
Saravanan, R.
DESALINATION, 2011, 274 (1-3) : 292 - 301
[46] Bayesian Transformer Using Disentangled Mask Attention
Chien, Jen-Tzung
Huang, Yu-Han
INTERSPEECH 2022, 2022, : 1761 - 1765
[47] A Dual-Attention Transformer Network for Pansharpening
Wu, Kun
Yang, Xiaomin
Nie, Zihao
Li, Haoran
Jeon, Gwanggil
IEEE SENSORS JOURNAL, 2024, 24 (05) : 5500 - 5511
[48] MTAtrack: Multilevel transformer attention for visual tracking
An, Dong
Zhang, Fan
Zhao, Yuqian
Luo, Biao
Yang, Chunhua
Chen, Baifan
Yu, Lingli
OPTICS AND LASER TECHNOLOGY, 2023, 166
[49] Cross Attention with Monotonic Alignment for Speech Transformer
Zhao, Yingzhu
Ni, Chongjia
Leung, Cheung-Chi
Joty, Shafiq
Chng, Eng Siong
Ma, Bin
INTERSPEECH 2020, 2020, : 5031 - 5035
[50] Attention! Transformer with Sentiment on Cryptocurrencies Price Prediction
Zhao, Huali
Crane, Martin
Bezbradica, Marija
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COMPLEXITY, FUTURE INFORMATION SYSTEMS AND RISK (COMPLEXIS), 2022, : 98 - 104

← 1 2 3 4 5 →