ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD

被引:1
|
作者
Wang, Jiahao [1 ]
Cao, Mingdeng [1 ]
Shi, Shuwei [1 ]
Wu, Baoyuan [2 ]
Yang, Yujiu [1 ]
机构
[1] Tsinghua Shenzhen Int Grad Sch, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
基金
中国国家自然科学基金;
关键词
Transformer; data-free; distillation;
D O I
10.1109/ICASSP43922.2022.9747484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vision transformers (ViTs) require intensive computational resources to achieve high performance, which usually makes them not suitable for mobile devices. A feasible strategy is to compress them using the original training data, which may be not accessible due to privacy limitations or transmission restrictions. In this case, utilizing the massive unlabeled data in the wild is an alternative paradigm, which has been proved effective for compressing convolutional neural networks (CNNs). However, due to the significant differences in model structure and computation mechanism between CNNs and ViTs, it is still an open issue that whether the similar paradigm is suitable for ViTs. In this work, we propose to effectively compress ViTs using the unlabeled data in the wild, consisting of two stages. First, we design an effective tool in selecting valuable data from the wild, dubbed Attention Probe. Second, based on the selected data, we develop a probe knowledge distillation algorithm to train a lightweight student transformer, through maximizing the similarities on both the outputs and intermediate features, between the heavy teacher and the lightweight student models. Extensive experimental results on several benchmarks demonstrate that the student transformer obtained by the proposed method can achieve comparable performance with the baseline that requires the original training data. Code is available at: https://github.com/IIGROUP/AttentionProbe.
引用
收藏
页码:2220 / 2224
页数:5
相关论文
共 50 条
  • [41] Masked Vision-language Transformer in Fashion
    Ge-Peng Ji
    Mingchen Zhuge
    Dehong Gao
    Deng-Ping Fan
    Christos Sakaridis
    Luc Van Gool
    Machine Intelligence Research, 2023, 20 : 421 - 434
  • [42] MaxViT: Multi-axis Vision Transformer
    Tu, Zhengzhong
    Talebi, Hossein
    Zhang, Han
    Yang, Feng
    Milanfar, Peyman
    Bovik, Alan
    Li, Yinxiao
    COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 : 459 - 479
  • [43] Towards liver segmentation in the wild via contrastive distillation
    Fogarollo, Stefano
    Bale, Reto
    Harders, Matthias
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (07) : 1143 - 1149
  • [44] Transformer-Based Distillation Hash Learning for Image Retrieval
    Lv, Yuanhai
    Wang, Chongyan
    Yuan, Wanteng
    Qian, Xiaohao
    Yang, Wujun
    Zhao, Wanqing
    ELECTRONICS, 2022, 11 (18)
  • [45] Experimental studies on absorption heat transformer coupled distillation system
    Sekar, S.
    Saravanan, R.
    DESALINATION, 2011, 274 (1-3) : 292 - 301
  • [46] Bayesian Transformer Using Disentangled Mask Attention
    Chien, Jen-Tzung
    Huang, Yu-Han
    INTERSPEECH 2022, 2022, : 1761 - 1765
  • [47] A Dual-Attention Transformer Network for Pansharpening
    Wu, Kun
    Yang, Xiaomin
    Nie, Zihao
    Li, Haoran
    Jeon, Gwanggil
    IEEE SENSORS JOURNAL, 2024, 24 (05) : 5500 - 5511
  • [48] MTAtrack: Multilevel transformer attention for visual tracking
    An, Dong
    Zhang, Fan
    Zhao, Yuqian
    Luo, Biao
    Yang, Chunhua
    Chen, Baifan
    Yu, Lingli
    OPTICS AND LASER TECHNOLOGY, 2023, 166
  • [49] Cross Attention with Monotonic Alignment for Speech Transformer
    Zhao, Yingzhu
    Ni, Chongjia
    Leung, Cheung-Chi
    Joty, Shafiq
    Chng, Eng Siong
    Ma, Bin
    INTERSPEECH 2020, 2020, : 5031 - 5035
  • [50] Attention! Transformer with Sentiment on Cryptocurrencies Price Prediction
    Zhao, Huali
    Crane, Martin
    Bezbradica, Marija
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COMPLEXITY, FUTURE INFORMATION SYSTEMS AND RISK (COMPLEXIS), 2022, : 98 - 104