Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

被引:1
作者
Kanca, Elif [1 ]
Ayas, Selen [2 ]
Kablan, Elif Baykal [1 ]
Ekinci, Murat [2 ]
机构
[1] Karadeniz Tech Univ, Yazilim Muhendisligi, Trabzon, Turkiye
[2] Karadeniz Tech Univ, Bilgisayar Muhendisligi, Trabzon, Turkiye
来源
2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年
关键词
Vision transformer-based models; transformers; medical image classification;
D O I
10.1109/SIU59756.2023.10223892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Reward modeling for mitigating toxicity in transformer-based language models
    Farshid Faal
    Ketra Schmitt
    Jia Yuan Yu
    Applied Intelligence, 2023, 53 : 8421 - 8435
  • [42] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [43] MPT-SFANet: Multiorder Pooling Transformer-Based Semantic Feature Aggregation Network for SAR Image Classification
    Ni, Kang
    Yuan, Chunyang
    Zheng, Zhizhong
    Zhang, Bingbing
    Wang, Peng
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024, 60 (04) : 4923 - 4938
  • [44] Enhancing the adversarial robustness in medical image classification: exploring adversarial machine learning with vision transformers-based models
    Elif Kanca Gulsoy
    Selen Ayas
    Elif Baykal Kablan
    Murat Ekinci
    Neural Computing and Applications, 2025, 37 (12) : 7971 - 7989
  • [45] Reward modeling for mitigating toxicity in transformer-based language models
    Faal, Farshid
    Schmitt, Ketra
    Yu, Jia Yuan
    APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
  • [46] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
    Wang, Wei
    Yang, Xin
    Tang, Jinhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
  • [47] Transformer-Based Seismic Image Enhancement: A Novel Approach for Improved Resolution
    Park, Jin-Yeong
    Saad, Omar M.
    Oh, Ju-Won
    Alkhalifah, Tariq
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [48] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
    Luo, Ting
    Wu, Jun
    He, Zhouyan
    Xu, Haiyong
    Jiang, Gangyi
    Chang, Chin-Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 4179 - 4196
  • [49] Vision Transformer With Contrastive Learning for Remote Sensing Image Scene Classification
    Bi, Meiqiao
    Wang, Minghua
    Li, Zhi
    Hong, Danfeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 738 - 749
  • [50] Transformer-Based End-to-End Anatomical and Functional Image Fusion
    Zhang, Jing
    Liu, Aiping
    Wang, Dan
    Liu, Yu
    Wang, Z. Jane
    Chen, Xun
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71