Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

被引：1

作者：

Kanca, Elif ^{[1
]}

Ayas, Selen ^{[2
]}

Kablan, Elif Baykal ^{[1
]}

Ekinci, Murat ^{[2
]}

机构：

[1] Karadeniz Tech Univ, Yazilim Muhendisligi, Trabzon, Turkiye

[2] Karadeniz Tech Univ, Bilgisayar Muhendisligi, Trabzon, Turkiye

来源：

2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年

关键词：

Vision transformer-based models; transformers; medical image classification;

D O I：

10.1109/SIU59756.2023.10223892

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.

引用

页数：4

共 50 条

[31] Hybrid Swin Transformer-Based Classification of Gaze Target Regions
Wu, Gongpu
Wang, Changyuan
Gao, Lina
Xue, Jinna
IEEE ACCESS, 2023, 11 : 132055 - 132067
[32] DUAL TRANSFORMER ENCODER MODEL FOR MEDICAL IMAGE CLASSIFICATION
Yan, Fangyuan
Yan, Bin
Pei, Mingtao
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 690 - 694
[33] LViT: Language Meets Vision Transformer in Medical Image Segmentation
Li, Zihan
Li, Yunxiang
Li, Qingde
Wang, Puyang
Guo, Dazhou
Lu, Le
Jin, Dakai
Zhang, You
Hong, Qingqi
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 96 - 107
[34] Vision Transformer-Based Anomaly Detection in Smart Grid Phasor Measurement Units Using Deep Learning Models
Liu, Zhibin
Wang, Yibo
Wang, Qingwei
Hu, Man
IEEE ACCESS, 2025, 13 : 44565 - 44576
[35] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
Yoo, Dayeon
Kim, Jeesu
Yoo, Jinwoo
IEEE ACCESS, 2024, 12 : 72598 - 72606
[36] AMMU: A survey of transformer-based biomedical pretrained language models
Kalyan, Katikapalli Subramanyam
Rajasekharan, Ajit
Sangeetha, Sivanesan
JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
[37] Transformer-based language models for mental health issues: A survey
Greco, Candida M.
Simeri, Andrea
Tagarelli, Andrea
Zumpano, Ester
PATTERN RECOGNITION LETTERS, 2023, 167 : 204 - 211
[38] Transformer-Based Models for the Automatic Indexing of Scientific Documents in French
Angel Gonzalez, Jose
Buscaldi, Davide
Sanchis, Emilio
Hurtado, Lluis-F
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 60 - 72
[39] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
Journal of Big Data, 9
[40] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)

← 1 2 3 4 5 →