Vision Transformers in medical computer vision-A contemplative retrospection

被引:113
作者
Parvaiz, Arshi [1 ]
Khalid, Muhammad Anwaar [1 ]
Zafar, Rukhsana [1 ]
Ameer, Huma [1 ]
Ali, Muhammad [1 ]
Fraz, Muhammad Moazam [1 ]
机构
[1] Natl Univ Sci & Technol NUST, Islamabad 44000, Pakistan
关键词
Vision Transformers; Medical image analysis; Self attention; Medical computer vision; Diagnostic image analysis; Literature survey; CONVOLUTIONAL NEURAL-NETWORK; BARRETTS-ESOPHAGUS; IMAGE DATABASE; LUNG-CANCER; SEGMENTATION; COVID-19; TOMOGRAPHY; CNN; LOCALIZATION; PREDICTION;
D O I
10.1016/j.engappai.2023.106126
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vision Transformers (ViTs), with the magnificent potential to unravel the information contained within images, have evolved as one of the most contemporary and dominant architectures that are being used in the field of computer vision. These are immensely utilized by plenty of researchers to perform new as well as former experiments. Here, in this article, we investigate the intersection of vision transformers and medical images. We proffered an overview of various ViT based frameworks that are being used by different researchers to decipher the obstacles in medical computer vision. We surveyed the applications of Vision Transformers in different areas of medical computer vision such as image-based disease classification, anatomical structure segmentation, registration, region-based lesion detection, captioning, report generation, and reconstruction using multiple medical imaging modalities that greatly assist in medical diagnosis and hence treatment process. Along with this, we also demystify several imaging modalities used in medical computer vision. Moreover, to get more insight and deeper understanding, the self-attention mechanism of transformers is also explained briefly. Conclusively, the ViT based solutions for each image analytics task are critically analyzed, open challenges are discussed and the pointers to possible solutions for future direction are deliberated. We hope this review article will open future research directions for medical computer vision researchers.
引用
收藏
页数:38
相关论文
共 243 条
[1]  
AAO, 2023, AM ACAD OPHTHALMOLOG
[2]  
Abramoff M., 2013, Retina, P151
[3]   NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study [J].
Agustsson, Eirikur ;
Timofte, Radu .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1122-1131
[4]  
Akilandeswari J, 2021, INT J PERFORM ENG, V17
[5]  
Al E. Hassan Syed, 2021, 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), P206, DOI 10.1109/CAIDA51941.2021.9425161
[6]   COVID-19 Detection in CT/X-ray Imagery Using Vision Transformers [J].
Al Rahhal, Mohamad Mahmoud ;
Bazi, Yakoub ;
Jomaa, Rami M. ;
AlShibli, Ahmad ;
Alajlan, Naif ;
Mekhalfi, Mohamed Lamine ;
Melgani, Farid .
JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (02)
[7]   An enhanced technique of skin cancer classification using deep convolutional neural network with transfer learning models [J].
Ali, Md Shahin ;
Miah, Md Sipon ;
Haque, Jahurul ;
Rahman, Md Mahbubur ;
Islam, Md Khairul .
MACHINE LEARNING WITH APPLICATIONS, 2021, 5
[8]   Diabetic Retinopathy Grading by Digital Curvelet Transform [J].
Alipour, Shirin Hajeb Mohammad ;
Rabbani, Hossein ;
Akhlaghi, Mohammad Reza .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2012, 2012
[9]   COViT-GAN: Vision Transformer for COVID-19 Detection in CT Scan Images with Self-Attention GAN for Data Augmentation [J].
Ambita, Ara Abigail E. ;
Boquio, Eujene Nikka, V ;
Naval, Prospero C., Jr. .
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 :587-598
[10]  
Amjoud A.B., 2021, ICDS, P1