A Survey of Applications of Vision Transformer and Its Variants

被引:2
作者
Wu, Chuang [1 ]
He, Tingqin [1 ]
机构
[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan, Peoples R China
来源
PROCEEDINGS OF THE 2024 IEEE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS 2024 | 2024年
关键词
Transformer; Computer Vision; Self-attention; High-level vision; Low-level vision; INDUSTRIAL INTERNET; SECURE;
D O I
10.1109/IDS62739.2024.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Transformer architecture, renowned for its efficacy in natural language processing, encounters unique hurdles when applied to computer vision. In response, the Vision Transformer (ViT) emerges as a successful adaptation for image classification tasks. While ViT exhibits tremendous potential in revolutionizing computer vision, addressing its inherent challenges and limitations stands as a critical endeavor. This comprehensive survey meticulously scrutinizes the drawbacks associated with ViT, proposing bespoke adaptations tailored to specific applications while showcasing their remarkable performance across diverse visual tasks. Moreover, it delves into the evolution of ViT adaptations across various visual domains, elucidating four promising directions for future research and development in this dynamic field.
引用
收藏
页码:21 / 25
页数:5
相关论文
共 57 条
  • [1] Learned Queries for Efficient Local Attention
    Arar, Moab
    Shamir, Ariel
    Bermano, Amit H.
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10831 - 10842
  • [2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
  • [3] Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
  • [4] Brown TB, 2020, ADV NEUR IN, V33
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
    Chan, Kelvin C. K.
    Wang, Xintao
    Yu, Ke
    Dong, Chao
    Loy, Chen Change
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4945 - 4954
  • [7] Mobile-Former: Bridging MobileNet and Transformer
    Chen, Yinpeng
    Dai, Xiyang
    Chen, Dongdong
    Liu, Mengchen
    Dong, Xiaoyi
    Yuan, Lu
    Liu, Zicheng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
  • [8] Client Scheduling and Resource Management for Efficient Training in Heterogeneous IoT-Edge Federated Learning
    Cui, Yangguan
    Cao, Kun
    Cao, Guitao
    Qiu, Meikang
    Wei, Tongquan
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (08) : 2407 - 2420
  • [9] Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
  • [10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929