Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

被引：6

作者：

Hwang, Elizabeth E. ^{[1
,2
]}

Chen, Dake ^{[1
]}

Han, Ying ^{[1
]}

Jia, Lin ^{[3
]}

Shan, Jing ^{[1
]}

机构：

[1] Univ Calif San Francisco, Dept Ophthalmol, San Francisco, CA 94143 USA

[2] Univ Calif San Francisco, Med Scientist Training Program, San Francisco, CA 94143 USA

[3] Digillect LLC, San Francisco, CA 94158 USA

来源：

BIOENGINEERING-BASEL | 2023年 / 10卷 / 11期

关键词：

glaucoma; deep learning; vision transformer; fundus photography;

D O I：

10.3390/bioengineering10111266

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.

引用

页数：13

共 9 条

[1] Visualization Comparison of Vision Transformers and Convolutional Neural Networks
Shi, Rui
Li, Tianxing
Zhang, Liguo
Yamaguchi, Yasushi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2327 - 2339
[2] Detecting Glaucoma in Highly Myopic Eyes From Fundus Photographs Using Deep Convolutional Neural Networks
Chen, Xiaohong
Zhou, Chen
Zhu, Yingting
Luo, Man
Hu, Lingjing
Han, Wenjing
Zuo, Chengguo
Li, Zhidong
Xiao, Hui
Huang, Shaofen
Chen, Xuhao
Zhao, Xiujuan
Lu, Lin
Wang, Yizhou
Zhuo, Yehong
CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2025,
[3] Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy
Goh, Jocelyn Hui Lin
Ang, Elroy
Srinivasan, Sahana
Lei, Xiaofeng
Loh, Johnathan
Quek, Ten Cheer
Xue, Cancan
Xu, Xinxing
Liu, Yong
Cheng, Ching-Yu
Rajapakse, Jagath C.
Tham, Yih-Chung
OPHTHALMOLOGY SCIENCE, 2024, 4 (06):
[4] Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks
Singh, Naseeb
Tewari, V. K.
Biswas, P. K.
INDUSTRIAL CROPS AND PRODUCTS, 2025, 223
[5] Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review
Takahashi, Satoshi
Sakaguchi, Yusuke
Kouno, Nobuji
Takasawa, Ken
Ishizu, Kenichi
Akagi, Yu
Aoyama, Rina
Teraya, Naoki
Bolatkan, Amina
Shinkai, Norio
Machino, Hidenori
Kobayashi, Kazuma
Asada, Ken
Komatsu, Masaaki
Kaneko, Syuzo
Sugiyama, Masashi
Hamamoto, Ryuji
JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
[6] Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers
Xiong, Xiaofan
Smith, Brian J.
Graves, Stephen A.
Graham, Michael M.
Buatti, John M.
Beichel, Reinhard R.
TOMOGRAPHY, 2023, 9 (05) : 1933 - 1948
[7] Comparison of the Performance of Convolutional Neural Networks and Vision Transformer-Based Systems for Automated Glaucoma Detection with Eye Fundus Images
Alayon, Silvia
Hernandez, Jorge
Fumero, Francisco J.
Sigut, Jose F.
Diaz-Aleman, Tinguaro
APPLIED SCIENCES-BASEL, 2023, 13 (23):
[8] A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): From Convolutional Neural Networks to Visual Transformers
Zhao, Peng
Li, Chen
Rahaman, Md Mamunur
Xu, Hao
Yang, Hechen
Sun, Hongzan
Jiang, Tao
Grzegorzek, Marcin
FRONTIERS IN MICROBIOLOGY, 2022, 13
[9] Is the aspect ratio of cells important in deep learning? A robust comparison of deep learning methods for multi-scale cytopathology cell image classification: From convolutional neural networks to visual transformers
Liu, Wanli
Li, Chen
Jiang, Tao
Sun, Hongzan
Wu, Xiangchen
Hu, Weiming
Chen, Haoyuan
Sun, Changhao
Yao, Yudong
Grzegorzek, Marcin
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141

← 1 →