Vision Transformer-based recognition of diabetic retinopathy grade

被引：68

作者：

Wu, Jianfang ^{[1
]}

Hu, Ruo ^{[1
]}

Xiao, Zhenghong ^{[1
]}

Chen, Jiaxu ^{[2
]}

Liu, Jingwei ^{[3
]}

机构：

[1] Guangdong Polytech Normal Univ, Sch Comp Sci, Guangzhou, Peoples R China

[2] Jinan Univ, Sch Tradit Chinese Med, Guangzhou, Peoples R China

[3] Huidong Peoples Hosp, Huizhou, Peoples R China

来源：

MEDICAL PHYSICS | 2021年 / 48卷 / 12期

基金：

中国国家自然科学基金;

关键词：

deep learning; diabetic retinopathy; multi-head attention; Vision Transformer; ATTENTION;

D O I：

10.1002/mp.15312

中图分类号：

R8 [特种医学]; R445 [影像诊断学];

学科分类号：

1002 ; 100207 ; 1009 ;

摘要：

Background In the domain of natural language processing, Transformers are recognized as state-of-the-art models, which opposing to typical convolutional neural networks (CNNs) do not rely on convolution layers. Instead, Transformers employ multi-head attention mechanisms as the main building block to capture long-range contextual relations between image pixels. Recently, CNNs dominated the deep learning solutions for diabetic retinopathy grade recognition. However, spurred by the advantages of Transformers, we propose a Transformer-based method that is appropriate for recognizing the grade of diabetic retinopathy. Purpose The purposes of this work are to demonstrate that (i) the pure attention mechanism is suitable for diabetic retinopathy grade recognition and (ii) Transformers can replace traditional CNNs for diabetic retinopathy grade recognition. Methods This paper proposes a Vision Transformer-based method to recognize the grade of diabetic retinopathy. Fundus images are subdivided into non-overlapping patches, which are then converted into sequences by flattening, and undergo a linear and positional embedding process to preserve positional information. Then, the generated sequence is input into several multi-head attention layers to generate the final representation. The first token sequence is input to a softmax classification layer to produce the recognition output in the classification stage. Results The dataset for training and testing employs fundus images of different resolutions, subdivided into patches. We challenge our method against current CNNs and extreme learning machines and achieve an appealing performance. Specifically, the suggested deep learning architecture attains an accuracy of 91.4%, specificity = 0.977 (95% confidence interval (CI) (0.951-1)), precision = 0.928 (95% CI (0.852-1)), sensitivity = 0.926 (95% CI (0.863-0.989)), quadratic weighted kappa score = 0.935, and area under curve (AUC) = 0.986. Conclusion Our comparative experiments against current methods conclude that our model is competitive and highlight that an attention mechanism based on a Vision Transformer model is promising for the diabetic retinopathy grade recognition task.

引用

页码：7850 / 7863

页数：14

共 44 条

[11]

Early Treatment Diabetic Retinopathy Study Research Group, 1991, OPHTHALMOLOGY, V98, P786, DOI [10.1016/S0161-6420(13)38012-9, DOI 10.1016/S0161-6420(13)38012-9]

[12] Attention Branch Network: Learning of Attention Mechanism for Visual Explanation [J].

Fukui, Hiroshi ;

Hirakawa, Tsubasa ;

Yamashita, Takayoshi ;

Fujiyoshi, Hironobu .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10697-10706

[13] Deep neural networks to predict diabetic retinopathy [J].

Gadekallu, Thippa Reddy ;

Khare, Neelu ;

Bhattacharya, Sweta ;

Singh, Saurabh ;

Maddikunta, Praveen Kumar Reddy ;

Srivastava, Gautam .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 14 (5) :5407-5420

[14] Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India [J].

Gulshan, Varun ;

Rajan, Renu P. ;

Widner, Kasumi ;

Wu, Derek ;

Wubbels, Peter ;

Rhodes, Tyler ;

Whitehouse, Kira ;

Coram, Marc ;

Corrado, Greg ;

Ramasamy, Kim ;

Raman, Rajiv ;

Peng, Lily ;

Webster, Dale R. .

JAMA OPHTHALMOLOGY, 2019, 137 (09) :987-993

[15] Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs [J].

Gulshan, Varun ;

Peng, Lily ;

Coram, Marc ;

Stumpe, Martin C. ;

Wu, Derek ;

Narayanaswamy, Arunachalam ;

Venugopalan, Subhashini ;

Widner, Kasumi ;

Madams, Tom ;

Cuadros, Jorge ;

Kim, Ramasamy ;

Raman, Rajiv ;

Nelson, Philip C. ;

Mega, Jessica L. ;

Webster, R. .

JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2016, 316 (22) :2402-2410

[16]

Guo L., 2018, Digital TV and Wireless Multimedia Communication. IFTC 2017, P193, DOI DOI 10.1007/978-981-10-8108-8_18

[17] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[18]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[19]

Jackson P.T., 2019, P IEEE C COMPUTER VI, P83, DOI DOI 10.48550/ARXIV.1809.05375

[20] VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers [J].

Kamran, Sharif Amit ;

Hossain, Khondker Fariha ;

Tavakkoli, Alireza ;

Zuckerbrod, Stewart Lee ;

Baker, Salah A. .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, :3228-3238

← 1 2 3 4 5 →