Vision Transformer (ViT)-based Applications in Image Classification

被引：7

作者：

Huo, Yingzi ^{[1
]}

Jin, Kai ^{[2
]}

Cai, Jiahong ^{[1
]}

Xiong, Huixuan ^{[1
]}

Pang, Jiacheng ^{[1
]}

机构：

[1] Hunan Univ Sci & Technol, Sch Comp Sci & Engn, Hunan Key Lab Serv Comp & Novel Software Technol, Xiangtan 411201, Peoples R China

[2] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410002, Peoples R China

来源：

2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS | 2023年

关键词：

CNN; image classification; token; vision transformer; Vision Reservoir;

D O I：

10.1109/BigDataSecurity-HPSC-IDS58521.2023.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the ViT model has been widely used in the field of computer vision, especially for image classification tasks. This paper summarizes the application of ViT in image classification tasks, first introduces the image classification implementation process and the basic architecture of the ViT model, then analyzes and summarizes the image classification methods, including traditional image classification methods, CNN -based image classification methods, and ViT-based image classification methods, and provides a comparative analysis of CNN and ViT. Subsequently, this paper outlines the application prospects of ViT in image classification and its future development and also outlines some shortcomings of ViT and its solutions.

引用

页码：135 / 140

页数：6

共 50 条

[21] CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification
He, Wenxuan
Huang, Weiliang
Liao, Shuhong
Xu, Zhen
Yan, Jingwen
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9266 - 9277
[22] Network Intrusion Detection Based on Feature Image and Deformable Vision Transformer Classification
He, Kan
Zhang, Wei
Zong, Xuejun
Lian, Lian
IEEE ACCESS, 2024, 12 : 44335 - 44350
[23] Compressed-Domain Vision Transformer for Image Classification
Ji, Ruolei
Karam, Lina J.
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 299 - 310
[24] Embedded-ViT: A Framework for Embedded Deployment of Vision-Transformer in Medical Applications
Ostrowski, Erik
Shafique, Muhammad
ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT II, 2025, 15047 : 371 - 382
[25] A Hyperspectral Image Classification Method Based on Adaptive Spectral Spatial Kernel Combined with Improved Vision Transformer
Wang, Aili
Xing, Shuang
Zhao, Yan
Wu, Haibin
Iwahori, Yuji
REMOTE SENSING, 2022, 14 (15)
[26] Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer
Lee, Chin Poo
Lim, Kian Ming
Song, Yu Xuan
Alqahtani, Ali
PLANTS-BASEL, 2023, 12 (14):
[27] CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture
Liu, Jintao
Becerra, Alfredo Tolon
Bienvenido-Barcena, Jose Fernando
Yang, Xinting
Zhao, Zhenxi
Zhou, Chao
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (07)
[28] A Multi-perspective Squeeze Excitation Classifier Based on Vision Transformer for Few Shot Image Classification
Zhang, Zebao
Li, Yuzhao
He, Ming
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 80 - 92
[29] ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
Abbas, Farhat
Yasmin, Mussarat
Fayyaz, Muhammad
Asim, Usman
PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (04) : 1805 - 1819
[30] ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
Farhat Abbas
Mussarat Yasmin
Muhammad Fayyaz
Usman Asim
Pattern Analysis and Applications, 2023, 26 : 1805 - 1819

← 1 2 3 4 5 →