Vision Transformer (ViT)-based Applications in Image Classification

被引：7

作者：

Huo, Yingzi ^{[1
]}

Jin, Kai ^{[2
]}

Cai, Jiahong ^{[1
]}

Xiong, Huixuan ^{[1
]}

Pang, Jiacheng ^{[1
]}

机构：

[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan 411201, Peoples R China

[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410002, Peoples R China

来源：

2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS | 2023年

关键词：

CNN; image classification; token; vision transformer; Vision Reservoir;

D O I：

10.1109/BigDataSecurity-HPSC-IDS58521.2023.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the ViT model has been widely used in the field of computer vision, especially for image classification tasks. This paper summarizes the application of ViT in image classification tasks, first introduces the image classification implementation process and the basic architecture of the ViT model, then analyzes and summarizes the image classification methods, including traditional image classification methods, CNN -based image classification methods, and ViT-based image classification methods, and provides a comparative analysis of CNN and ViT. Subsequently, this paper outlines the application prospects of ViT in image classification and its future development and also outlines some shortcomings of ViT and its solutions.

引用

页码：135 / 140

页数：6

共 50 条

[41] Effects of JPEG Compression on Vision Transformer Image Classification for Encryption-then-Compression Images
Hamano, Genki
Imaizumi, Shoko
Kiya, Hitoshi
SENSORS, 2023, 23 (07)
[42] GFPE-ViT: vision transformer with geometric-fractal-based position encoding
Wang, Lei
Tang, Xue-song
Hao, Kuangrong
VISUAL COMPUTER, 2025, 41 (02) : 1021 - 1036
[43] P-Vit: A Simplified Vision Transformer Model Based on FFN and Simple Attention
Hu, Wei
Hu, Mingce
Liu, Fang
Han, Yi
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT V, KSEM 2024, 2024, 14888 : 316 - 326
[44] Improving vision transformer for medical image classification via token-wise perturbation
Li, Yuexiang
Huang, Yawen
He, Nanjun
Ma, Kai
Zheng, Yefeng
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
[45] IEViT: An enhanced vision transformer architecture for chest X-ray image classification
Okolo, Gabriel Iluebe
Katsigiannis, Stamos
Ramzan, Naeem
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
[46] RViT: Robust Fusion Vision Transformer with Variational Hierarchical Denoising Process for Image Classification
Lin, Zhenghong
Wu, Yuze
Chen, Jiawei
Wang, Shiping
GUIDANCE NAVIGATION AND CONTROL, 2024, 04 (03)
[47] CE-ViT: A Robust Channel Estimator Based on Vision Transformer for OFDM Systems
Liu, Fangyu
Zhang, Jing
Jiang, Peiwen
Wen, Chao-Kai
Jin, Shi
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4798 - 4803
[48] Vision transformer based classification of sewer defects weighted loss model
Ji, Chunhou
Xie, Zhiqiang
Li, Rong
Yang, Zhibing
Hou, Zhiqun
TUNNELLING AND UNDERGROUND SPACE TECHNOLOGY, 2025, 156
[49] Air Quality Classification and Measurement Based on Double Output Vision Transformer
Wang, Zhenyu
Yang, Yingdong
Yue, Shaolong
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (21): : 20975 - 20984
[50] Causal-ViT: Robust Vision Transformer by causal intervention
Li, Wei
Li, Zhixin
Yang, Xiwei
Ma, Huifang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126

← 1 2 3 4 5 →