Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

被引:1
|
作者
Kanca, Elif [1 ]
Ayas, Selen [2 ]
Kablan, Elif Baykal [1 ]
Ekinci, Murat [2 ]
机构
[1] Karadeniz Tech Univ, Yazilim Muhendisligi, Trabzon, Turkiye
[2] Karadeniz Tech Univ, Bilgisayar Muhendisligi, Trabzon, Turkiye
来源
2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年
关键词
Vision transformer-based models; transformers; medical image classification;
D O I
10.1109/SIU59756.2023.10223892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Hybrid Swin Transformer-Based Classification of Gaze Target Regions
    Wu, Gongpu
    Wang, Changyuan
    Gao, Lina
    Xue, Jinna
    IEEE ACCESS, 2023, 11 : 132055 - 132067
  • [32] DUAL TRANSFORMER ENCODER MODEL FOR MEDICAL IMAGE CLASSIFICATION
    Yan, Fangyuan
    Yan, Bin
    Pei, Mingtao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 690 - 694
  • [33] LViT: Language Meets Vision Transformer in Medical Image Segmentation
    Li, Zihan
    Li, Yunxiang
    Li, Qingde
    Wang, Puyang
    Guo, Dazhou
    Lu, Le
    Jin, Dakai
    Zhang, You
    Hong, Qingqi
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 96 - 107
  • [34] Vision Transformer-Based Anomaly Detection in Smart Grid Phasor Measurement Units Using Deep Learning Models
    Liu, Zhibin
    Wang, Yibo
    Wang, Qingwei
    Hu, Man
    IEEE ACCESS, 2025, 13 : 44565 - 44576
  • [35] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606
  • [36] AMMU: A survey of transformer-based biomedical pretrained language models
    Kalyan, Katikapalli Subramanyam
    Rajasekharan, Ajit
    Sangeetha, Sivanesan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [37] Transformer-based language models for mental health issues: A survey
    Greco, Candida M.
    Simeri, Andrea
    Tagarelli, Andrea
    Zumpano, Ester
    PATTERN RECOGNITION LETTERS, 2023, 167 : 204 - 211
  • [38] Transformer-Based Models for the Automatic Indexing of Scientific Documents in French
    Angel Gonzalez, Jose
    Buscaldi, Davide
    Sanchis, Emilio
    Hurtado, Lluis-F
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 60 - 72
  • [39] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [40] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)