Vision Transformer (ViT)-based Applications in Image Classification

被引:7
作者
Huo, Yingzi [1 ]
Jin, Kai [2 ]
Cai, Jiahong [1 ]
Xiong, Huixuan [1 ]
Pang, Jiacheng [1 ]
机构
[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan 411201, Peoples R China
[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410002, Peoples R China
来源
2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS | 2023年
关键词
CNN; image classification; token; vision transformer; Vision Reservoir;
D O I
10.1109/BigDataSecurity-HPSC-IDS58521.2023.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the ViT model has been widely used in the field of computer vision, especially for image classification tasks. This paper summarizes the application of ViT in image classification tasks, first introduces the image classification implementation process and the basic architecture of the ViT model, then analyzes and summarizes the image classification methods, including traditional image classification methods, CNN -based image classification methods, and ViT-based image classification methods, and provides a comparative analysis of CNN and ViT. Subsequently, this paper outlines the application prospects of ViT in image classification and its future development and also outlines some shortcomings of ViT and its solutions.
引用
收藏
页码:135 / 140
页数:6
相关论文
共 50 条
  • [41] Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    SENSORS, 2023, 23 (07)
  • [42] GFPE-ViT: vision transformer with geometric-fractal-based position encoding
    Wang, Lei
    Tang, Xue-song
    Hao, Kuangrong
    VISUAL COMPUTER, 2025, 41 (02) : 1021 - 1036
  • [43] P-Vit: A Simplified Vision Transformer Model Based on FFN and Simple Attention
    Hu, Wei
    Hu, Mingce
    Liu, Fang
    Han, Yi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024, 2024, 14888 : 316 - 326
  • [44] Improving vision transformer for medical image classification via token-wise perturbation
    Li, Yuexiang
    Huang, Yawen
    He, Nanjun
    Ma, Kai
    Zheng, Yefeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [45] IEViT: An enhanced vision transformer architecture for chest X-ray image classification
    Okolo, Gabriel Iluebe
    Katsigiannis, Stamos
    Ramzan, Naeem
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [46] RViT: Robust Fusion Vision Transformer with Variational Hierarchical Denoising Process for Image Classification
    Lin, Zhenghong
    Wu, Yuze
    Chen, Jiawei
    Wang, Shiping
    GUIDANCE NAVIGATION AND CONTROL, 2024, 04 (03)
  • [47] CE-ViT: A Robust Channel Estimator Based on Vision Transformer for OFDM Systems
    Liu, Fangyu
    Zhang, Jing
    Jiang, Peiwen
    Wen, Chao-Kai
    Jin, Shi
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4798 - 4803
  • [48] Vision transformer based classification of sewer defects weighted loss model
    Ji, Chunhou
    Xie, Zhiqiang
    Li, Rong
    Yang, Zhibing
    Hou, Zhiqun
    TUNNELLING AND UNDERGROUND SPACE TECHNOLOGY, 2025, 156
  • [49] Air Quality Classification and Measurement Based on Double Output Vision Transformer
    Wang, Zhenyu
    Yang, Yingdong
    Yue, Shaolong
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (21): : 20975 - 20984
  • [50] Causal-ViT: Robust Vision Transformer by causal intervention
    Li, Wei
    Li, Zhixin
    Yang, Xiwei
    Ma, Huifang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126