A Survey of Applications of Vision Transformer and Its Variants

被引：2

作者：

Wu, Chuang ^{[1
]}

He, Tingqin ^{[1
]}

机构：

[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan, Peoples R China

来源：

PROCEEDINGS OF THE 2024 IEEE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS 2024 | 2024年

关键词：

Transformer; Computer Vision; Self-attention; High-level vision; Low-level vision; INDUSTRIAL INTERNET; SECURE;

D O I：

10.1109/IDS62739.2024.00011

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Transformer architecture, renowned for its efficacy in natural language processing, encounters unique hurdles when applied to computer vision. In response, the Vision Transformer (ViT) emerges as a successful adaptation for image classification tasks. While ViT exhibits tremendous potential in revolutionizing computer vision, addressing its inherent challenges and limitations stands as a critical endeavor. This comprehensive survey meticulously scrutinizes the drawbacks associated with ViT, proposing bespoke adaptations tailored to specific applications while showcasing their remarkable performance across diverse visual tasks. Moreover, it delves into the evolution of ViT adaptations across various visual domains, elucidating four promising directions for future research and development in this dynamic field.

引用

页码：21 / 25

页数：5

共 57 条

[1] Learned Queries for Efficient Local Attention
Arar, Moab
Shamir, Ariel
Bermano, Amit H.
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10831 - 10842
[2] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[3] Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[4] Brown TB, 2020, ADV NEUR IN, V33
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond
Chan, Kelvin C. K.
Wang, Xintao
Yu, Ke
Dong, Chao
Loy, Chen Change
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4945 - 4954
[7] Mobile-Former: Bridging MobileNet and Transformer
Chen, Yinpeng
Dai, Xiyang
Chen, Dongdong
Liu, Mengchen
Dong, Xiaoyi
Yuan, Lu
Liu, Zicheng
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5260 - 5269
[8] Client Scheduling and Resource Management for Efficient Training in Heterogeneous IoT-Edge Federated Learning
Cui, Yangguan
Cao, Kun
Cao, Guitao
Qiu, Meikang
Wei, Tongquan
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (08) : 2407 - 2420
[9] Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[10] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

← 1 2 3 4 5 6 →