Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

被引:6
|
作者
Hwang, Elizabeth E. [1 ,2 ]
Chen, Dake [1 ]
Han, Ying [1 ]
Jia, Lin [3 ]
Shan, Jing [1 ]
机构
[1] Univ Calif San Francisco, Dept Ophthalmol, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Med Scientist Training Program, San Francisco, CA 94143 USA
[3] Digillect LLC, San Francisco, CA 94158 USA
来源
BIOENGINEERING-BASEL | 2023年 / 10卷 / 11期
关键词
glaucoma; deep learning; vision transformer; fundus photography;
D O I
10.3390/bioengineering10111266
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.
引用
收藏
页数:13
相关论文
共 9 条
  • [1] Visualization Comparison of Vision Transformers and Convolutional Neural Networks
    Shi, Rui
    Li, Tianxing
    Zhang, Liguo
    Yamaguchi, Yasushi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2327 - 2339
  • [2] Detecting Glaucoma in Highly Myopic Eyes From Fundus Photographs Using Deep Convolutional Neural Networks
    Chen, Xiaohong
    Zhou, Chen
    Zhu, Yingting
    Luo, Man
    Hu, Lingjing
    Han, Wenjing
    Zuo, Chengguo
    Li, Zhidong
    Xiao, Hui
    Huang, Shaofen
    Chen, Xuhao
    Zhao, Xiujuan
    Lu, Lin
    Wang, Yizhou
    Zhuo, Yehong
    CLINICAL AND EXPERIMENTAL OPHTHALMOLOGY, 2025,
  • [3] Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy
    Goh, Jocelyn Hui Lin
    Ang, Elroy
    Srinivasan, Sahana
    Lei, Xiaofeng
    Loh, Johnathan
    Quek, Ten Cheer
    Xue, Cancan
    Xu, Xinxing
    Liu, Yong
    Cheng, Ching-Yu
    Rajapakse, Jagath C.
    Tham, Yih-Chung
    OPHTHALMOLOGY SCIENCE, 2024, 4 (06):
  • [4] Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks
    Singh, Naseeb
    Tewari, V. K.
    Biswas, P. K.
    INDUSTRIAL CROPS AND PRODUCTS, 2025, 223
  • [5] Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review
    Takahashi, Satoshi
    Sakaguchi, Yusuke
    Kouno, Nobuji
    Takasawa, Ken
    Ishizu, Kenichi
    Akagi, Yu
    Aoyama, Rina
    Teraya, Naoki
    Bolatkan, Amina
    Shinkai, Norio
    Machino, Hidenori
    Kobayashi, Kazuma
    Asada, Ken
    Komatsu, Masaaki
    Kaneko, Syuzo
    Sugiyama, Masashi
    Hamamoto, Ryuji
    JOURNAL OF MEDICAL SYSTEMS, 2024, 48 (01)
  • [6] Head and Neck Cancer Segmentation in FDG PET Images: Performance Comparison of Convolutional Neural Networks and Vision Transformers
    Xiong, Xiaofan
    Smith, Brian J.
    Graves, Stephen A.
    Graham, Michael M.
    Buatti, John M.
    Beichel, Reinhard R.
    TOMOGRAPHY, 2023, 9 (05) : 1933 - 1948
  • [7] Comparison of the Performance of Convolutional Neural Networks and Vision Transformer-Based Systems for Automated Glaucoma Detection with Eye Fundus Images
    Alayon, Silvia
    Hernandez, Jorge
    Fumero, Francisco J.
    Sigut, Jose F.
    Diaz-Aleman, Tinguaro
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [8] A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): From Convolutional Neural Networks to Visual Transformers
    Zhao, Peng
    Li, Chen
    Rahaman, Md Mamunur
    Xu, Hao
    Yang, Hechen
    Sun, Hongzan
    Jiang, Tao
    Grzegorzek, Marcin
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [9] Is the aspect ratio of cells important in deep learning? A robust comparison of deep learning methods for multi-scale cytopathology cell image classification: From convolutional neural networks to visual transformers
    Liu, Wanli
    Li, Chen
    Jiang, Tao
    Sun, Hongzan
    Wu, Xiangchen
    Hu, Weiming
    Chen, Haoyuan
    Sun, Changhao
    Yao, Yudong
    Grzegorzek, Marcin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 141