Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection

被引:41
作者
Dutta, Pramit [1 ]
Sathi, Khaleda Akther [1 ]
Hossain, Md. Azad [1 ]
Dewan, M. Ali Akber [2 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Elect & Telecommun Engn, Chattogram 4349, Bangladesh
[2] Athabasca Univ, Fac Sci & Technol, Sch Comp & Informat Syst, Athabasca, AB T9S 3A3, Canada
关键词
retinal disease; classification; hybrid feature; Inception-V3; ResNet-50; vision transformer; MACULAR DEGENERATION; PREVALENCE; ENSEMBLE;
D O I
10.3390/jimaging9070140
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
The current advancement towards retinal disease detection mainly focused on distinct feature extraction using either a convolutional neural network (CNN) or a transformer-based end-to-end deep learning (DL) model. The individual end-to-end DL models are capable of only processing texture or shape-based information for performing detection tasks. However, extraction of only texture- or shape-based features does not provide the model robustness needed to classify different types of retinal diseases. Therefore, concerning these two features, this paper developed a fusion model called 'Conv-ViT' to detect retinal diseases from foveal cut optical coherence tomography (OCT) images. The transfer learning-based CNN models, such as Inception-V3 and ResNet-50, are utilized to process texture information by calculating the correlation of the nearby pixel. Additionally, the vision transformer model is fused to process shape-based features by determining the correlation between long-distance pixels. The hybridization of these three models results in shape-based texture feature learning during the classification of retinal diseases into its four classes, including choroidal neovascularization (CNV), diabetic macular edema (DME), DRUSEN, and NORMAL. The weighted average classification accuracy, precision, recall, and F1 score of the model are found to be approximately 94%. The results indicate that the fusion of both texture and shape features assisted the proposed Conv-ViT model to outperform the state-of-the-art retinal disease classification models.
引用
收藏
页数:20
相关论文
共 33 条
[1]  
AlDahoul N., 2021, F1000Research, V10, P948
[2]  
[Anonymous], 2003, WORKSHOP LEARNING IM
[3]   Ensemble Learning Approach to Retinal Thickness Assessment in Optical Coherence Tomography [J].
Cazanas-Gordon, Alex ;
Parra-Mora, Esther ;
Cruz, Luis A. Da Silva .
IEEE ACCESS, 2021, 9 :67349-67363
[4]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[5]   Iterative fusion convolutional neural networks for classification of optical coherence tomography images [J].
Fang, Leyuan ;
Jin, Yuxuan ;
Huang, Laifeng ;
Guo, Siyu ;
Zhao, Guangzhe ;
Chen, Xiangdong .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 59 :327-333
[6]   Vascular endothelial growth factor and age-related macular degeneration: from basic science to therapy [J].
Ferrara, Napoleone .
NATURE MEDICINE, 2010, 16 (10) :1107-1111
[7]  
Friedman DS, 2004, ARCH OPHTHALMOL-CHIC, V122, P564
[8]   OCTID: Optical coherence tomography image database [J].
Gholami, Peyman ;
Roy, Priyanka ;
Parthasarathy, Mohana Kuppuswamy ;
Lakshminarayanan, Vasudevan .
COMPUTERS & ELECTRICAL ENGINEERING, 2020, 81
[9]  
Gupta Aryan, 2022, 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), P1571, DOI 10.1109/ICCMC53470.2022.9753761
[10]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778