Efficient pneumonia detection using Vision Transformers on chest X-rays

被引：25

作者：

Singh, Sukhendra ^{[1
]}

Kumar, Manoj ^{[1
]}

Kumar, Abhay ^{[2
]}

Verma, Birendra Kumar ^{[1
]}

Abhishek, Kumar ^{[2
]}

Selvarajan, Shitharth ^{[3
]}

机构：

[1] JSS Acad Tech Educ, Noida, India

[2] Natl Inst Technol Patna, Patna, India

[3] Leeds Beckett Univ, Sch Built Environm Engn & Comp, Leeds LS1 3HE, England

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

ATTENTION;

D O I：

10.1038/s41598-024-52703-2

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Pneumonia is a widespread and acute respiratory infection that impacts people of all ages. Early detection and treatment of pneumonia are essential for avoiding complications and enhancing clinical results. We can reduce mortality, improve healthcare efficiency, and contribute to the global battle against a disease that has plagued humanity for centuries by devising and deploying effective detection methods. Detecting pneumonia is not only a medical necessity but also a humanitarian imperative and a technological frontier. Chest X-rays are a frequently used imaging modality for diagnosing pneumonia. This paper examines in detail a cutting-edge method for detecting pneumonia implemented on the Vision Transformer (ViT) architecture on a public dataset of chest X-rays available on Kaggle. To acquire global context and spatial relationships from chest X-ray images, the proposed framework deploys the ViT model, which integrates self-attention mechanisms and transformer architecture. According to our experimentation with the proposed Vision Transformer-based framework, it achieves a higher accuracy of 97.61%, sensitivity of 95%, and specificity of 98% in detecting pneumonia from chest X-rays. The ViT model is preferable for capturing global context, comprehending spatial relationships, and processing images that have different resolutions. The framework establishes its efficacy as a robust pneumonia detection solution by surpassing convolutional neural network (CNN) based architectures.

引用

页数：17

共 64 条

[31]

Li Wei, 2022, SepViT: Separable Vision Transformer

[32] A transfer learning method with deep residual network for pediatric pneumonia diagnosis [J].