Vision Transformer (ViT)-based Applications in Image Classification

被引:7
|
作者
Huo, Yingzi [1 ]
Jin, Kai [2 ]
Cai, Jiahong [1 ]
Xiong, Huixuan [1 ]
Pang, Jiacheng [1 ]
机构
[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan 411201, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410002, Peoples R China
来源
2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS | 2023年
关键词
CNN; image classification; token; vision transformer; Vision Reservoir;
D O I
10.1109/BigDataSecurity-HPSC-IDS58521.2023.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the ViT model has been widely used in the field of computer vision, especially for image classification tasks. This paper summarizes the application of ViT in image classification tasks, first introduces the image classification implementation process and the basic architecture of the ViT model, then analyzes and summarizes the image classification methods, including traditional image classification methods, CNN -based image classification methods, and ViT-based image classification methods, and provides a comparative analysis of CNN and ViT. Subsequently, this paper outlines the application prospects of ViT in image classification and its future development and also outlines some shortcomings of ViT and its solutions.
引用
收藏
页码:135 / 140
页数:6
相关论文
共 50 条
  • [21] CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification
    He, Wenxuan
    Huang, Weiliang
    Liao, Shuhong
    Xu, Zhen
    Yan, Jingwen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9266 - 9277
  • [22] Network Intrusion Detection Based on Feature Image and Deformable Vision Transformer Classification
    He, Kan
    Zhang, Wei
    Zong, Xuejun
    Lian, Lian
    IEEE ACCESS, 2024, 12 : 44335 - 44350
  • [23] Compressed-Domain Vision Transformer for Image Classification
    Ji, Ruolei
    Karam, Lina J.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 299 - 310
  • [24] Embedded-ViT: A Framework for Embedded Deployment of Vision-Transformer in Medical Applications
    Ostrowski, Erik
    Shafique, Muhammad
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT II, 2025, 15047 : 371 - 382
  • [25] A Hyperspectral Image Classification Method Based on Adaptive Spectral Spatial Kernel Combined with Improved Vision Transformer
    Wang, Aili
    Xing, Shuang
    Zhao, Yan
    Wu, Haibin
    Iwahori, Yuji
    REMOTE SENSING, 2022, 14 (15)
  • [26] Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer
    Lee, Chin Poo
    Lim, Kian Ming
    Song, Yu Xuan
    Alqahtani, Ali
    PLANTS-BASEL, 2023, 12 (14):
  • [27] CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture
    Liu, Jintao
    Becerra, Alfredo Tolon
    Bienvenido-Barcena, Jose Fernando
    Yang, Xinting
    Zhao, Zhenxi
    Zhou, Chao
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (07)
  • [28] A Multi-perspective Squeeze Excitation Classifier Based on Vision Transformer for Few Shot Image Classification
    Zhang, Zebao
    Li, Yuzhao
    He, Ming
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 80 - 92
  • [29] ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
    Abbas, Farhat
    Yasmin, Mussarat
    Fayyaz, Muhammad
    Asim, Usman
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (04) : 1805 - 1819
  • [30] ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
    Farhat Abbas
    Mussarat Yasmin
    Muhammad Fayyaz
    Usman Asim
    Pattern Analysis and Applications, 2023, 26 : 1805 - 1819