Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

被引：1

作者：

Kanca, Elif ^{[1
]}

Ayas, Selen ^{[2
]}

Kablan, Elif Baykal ^{[1
]}

Ekinci, Murat ^{[2
]}

机构：

[1] Karadeniz Tech Univ, Yazilim Muhendisligi, Trabzon, Turkiye

[2] Karadeniz Tech Univ, Bilgisayar Muhendisligi, Trabzon, Turkiye

来源：

2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年

关键词：

Vision transformer-based models; transformers; medical image classification;

D O I：

10.1109/SIU59756.2023.10223892

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.

引用

页数：4

共 50 条

[41] Reward modeling for mitigating toxicity in transformer-based language models
Farshid Faal
Ketra Schmitt
Jia Yuan Yu
Applied Intelligence, 2023, 53 : 8421 - 8435
[42] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)
[43] MPT-SFANet: Multiorder Pooling Transformer-Based Semantic Feature Aggregation Network for SAR Image Classification
Ni, Kang
Yuan, Chunyang
Zheng, Zhizhong
Zhang, Bingbing
Wang, Peng
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 4923 - 4938
[44] Enhancing the adversarial robustness in medical image classification: exploring adversarial machine learning with vision transformers-based models
Elif Kanca Gulsoy
Selen Ayas
Elif Baykal Kablan
Murat Ekinci
Neural Computing and Applications, 2025, 37 (12) : 7971 - 7989
[45] Reward modeling for mitigating toxicity in transformer-based language models
Faal, Farshid
Schmitt, Ketra
Yu, Jia Yuan
APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
[46] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
Wang, Wei
Yang, Xin
Tang, Jinhui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
[47] Transformer-Based Seismic Image Enhancement: A Novel Approach for Improved Resolution
Park, Jin-Yeong
Saad, Omar M.
Oh, Ju-Won
Alkhalifah, Tariq
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
[48] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
Luo, Ting
Wu, Jun
He, Zhouyan
Xu, Haiyong
Jiang, Gangyi
Chang, Chin-Chen
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 4179 - 4196
[49] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
Bi, Meiqiao
Wang, Minghua
Li, Zhi
Hong, Danfeng
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
[50] Transformer-Based End-to-End Anatomical and Functional Image Fusion
Zhang, Jing
Liu, Aiping
Wang, Dan
Liu, Yu
Wang, Z. Jane
Chen, Xun
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71

← 1 2 3 4 5 →