Performance Comparison of Vision Transformer-Based Models in Medical Image Classification

被引：1

作者：

Kanca, Elif ^{[1
]}

Ayas, Selen ^{[2
]}

Kablan, Elif Baykal ^{[1
]}

Ekinci, Murat ^{[2
]}

机构：

[1] Karadeniz Tech Univ, Yazilim Muhendisligi, Trabzon, Turkiye

[2] Karadeniz Tech Univ, Bilgisayar Muhendisligi, Trabzon, Turkiye

来源：

2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年

关键词：

Vision transformer-based models; transformers; medical image classification;

D O I：

10.1109/SIU59756.2023.10223892

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, convolutional neural networks have shown significant success and are frequently used in medical image analysis applications. However, the convolution process in convolutional neural networks limits learning of long-term pixel dependencies in the local receptive field. Inspired by the success of transformer architectures in encoding long-term dependencies and learning more efficient feature representation in natural language processing, publicly available color fundus retina, skin lesion, chest X-ray, and breast histology images are classified using Vision Transformer (ViT), Data-Efficient Transformer (DeiT), Swin Transformer, and Pyramid Vision Transformer v2 (PVTv2) models and their classification performances are compared in this study. The results show that the highest accuracy values are obtained with the DeiT model at 96.5% in the chest X-ray dataset, the PVTv2 model at 91.6% in the breast histology dataset, the PVTv2 model at 91.3% in the retina fundus dataset, and the Swin model at 91.0% in the skin lesion dataset.

引用

页数：4

共 18 条

[1]

[Anonymous], 2019, The international skin imaging collaboration

[2] BACH: Grand challenge on breast cancer histology images [J].

Aresta, Guilherme ;

Araujo, Teresa ;

Kwok, Scotty ;

Chennamsetty, Sai Saketh ;

Safwan, Mohammed ;

Alex, Varghese ;

Marami, Bahram ;

Prastawa, Marcel ;

Chan, Monica ;

Donovan, Michael ;

Fernandez, Gerardo ;

Zeineh, Jack ;

Kohl, Matthias ;

Walz, Christoph ;

Ludwig, Florian ;

Braunewell, Stefan ;

Baust, Maximilian ;

Quoc Dang Vu ;

Minh Nguyen Nhat To ;

Kim, Eal ;

Kwak, Jin Tae ;

Galal, Sameh ;

Sanchez-Freire, Veronica ;

Brancati, Nadia ;

Frucci, Maria ;

Riccio, Daniel ;

Wang, Yaqi ;

Sun, Lingling ;

Ma, Kaiqiang ;

Fang, Jiannan ;

Kone, Ismael ;

Boulmane, Lahsen ;

Campilho, Aurelio ;

Eloy, Catarina ;

Polonia, Antonio ;

Aguiar, Paulo .

MEDICAL IMAGE ANALYSIS, 2019, 56 :122-139

[3]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arxiv.1810.04805]

[4] Computer-aided diagnosis in medical imaging: Historical review, current status and future potential [J].

Doi, Kunio .

COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2007, 31 (4-5) :198-211

[5]

Dosovitskiy A., 2021, An image is worth 16x16 words: Transformers for image recognition at scale

[6] Vision Transformers for Classification of Breast Ultrasound Images [J].

Gheflati, Behnaz ;

Rivaz, Hassan .

2022 44TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2022, :480-483

[7]

He K., 2022, Intelligent Medicine

[8]

Jingxing Li, 2021, 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), P246, DOI 10.1109/CEI52496.2021.9574576

[9]

kaggle, diabetic retinopathy

[10] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].

Liu, Ze ;

Lin, Yutong ;

Cao, Yue ;

Hu, Han ;

Wei, Yixuan ;

Zhang, Zheng ;

Lin, Stephen ;

Guo, Baining .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002

← 1 2 →