Integrating self-supervised learning with vision transformers for glaucoma detection

被引:0
作者
Liao, Caisheng [1 ]
Todo, Yuki [2 ]
Tang, Zheng [3 ]
机构
[1] Kanazawa Univ, Div Elect Informat & Commun Engn, Kanazawa, Japan
[2] Kanazawa Univ, Fac Elect Informat & Commun Engn, Kanazawa, Japan
[3] Univ Toyama, Dept Intelligence Informat Syst, Toyama, Japan
基金
日本学术振兴会;
关键词
volume contrast; vision transformer; early glaucoma detection; fundus image classification; DIAGNOSIS; FRAMEWORK; CNN;
D O I
10.1117/1.JEI.34.2.023016
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Traditional deep learning models, such as convolutional neural networks (CNNs), have played a crucial role in assisting the early diagnosis of glaucoma. However, these models face limitations in capturing the long-range dependencies required for effective glaucoma detection. We propose a hybrid model that combines the Vision Transformer (ViT) with a self-supervised volume contrast (VoCo) learning framework. The ViT is used to extract global contextual features, whereas VoCo focuses on capturing generalized fine-grained representations. After extracting these multi-level features, the model employs a weighted fusion mechanism to select and integrate the most relevant features, ensuring robustness in the diagnostic process. The model was validated using the JustRAIGS 2024 dataset. The proposed ViT+VoCo model demonstrates exceptional performance on key metrics: sensitivity of 0.8355, specificity of 0.9594, accuracy of 0.9556, and area under the curve (AUC) of 0.9711. Specifically, the accuracy and AUC improved by 1.55% and 1.45%, respectively, compared with the baseline ViT, and it outperformed classic CNNs and state-of-the-art systems on most metrics. Despite certain challenges, we validate the effectiveness of integrating global features with fine-grained generalized representations for early glaucoma detection. The results underscore the potential of this hybrid approach and highlight its promise in clinical applications. Future work could explore multimodal learning, domain adaptation, and model interpretability to further enhance the model's clinical applicability and impact.
引用
收藏
页数:17
相关论文
共 49 条
[1]   Glaucoma diagnosis in the era of deep learning: A survey [J].
Ashtari-Majlan, Mona ;
Dehshibi, Mohammad Mahdi ;
Masip, David .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256
[2]   MIL-ViT: A multiple instance vision transformer for fundus image classification [J].
Bi, Qi ;
Sun, Xu ;
Yu, Shuang ;
Ma, Kai ;
Bian, Cheng ;
Ning, Munan ;
He, Nanjun ;
Huang, Yawen ;
Li, Yuexiang ;
Liu, Hanruo ;
Zheng, Yefeng .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
[3]   DEEP LEARNING MODELS FOR JUSTIFIED REFERRAL IN AI GLAUCOMA SCREENING [J].
Casado-Garcia, A. . ;
Heras, J. ;
Ortega, M. ;
Ramos, L. .
IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI 2024, 2024,
[4]  
Chen T., 2020, INT C MACH LEARN PML, P1597
[5]   Performance of Deep Learning Architectures and Transfer Learning for Detecting Glaucomatous Optic Neuropathy in Fundus Photographs [J].
Christopher, Mark ;
Beighith, Akram ;
Bowd, Christopher ;
Proudfoot, James A. ;
Goldbaum, Michael H. ;
Weinreb, Robert N. ;
Girkin, Christopher A. ;
Liebmann, Jeffrey M. ;
Zangwill, Linda M. .
SCIENTIFIC REPORTS, 2018, 8
[6]   AlterNet-K: a small and compact model for the detection of glaucoma [J].
D'Souza, Gavin ;
Siddalingaswamy, P. C. ;
Pandya, Mayur Anand .
BIOMEDICAL ENGINEERING LETTERS, 2024, 14 (01) :23-33
[7]   CDAM-Net: Channel shuffle dual attention based multi-scale CNN for efficient glaucoma detection using fundus images [J].
Das, Dipankar ;
Nayak, Deepak Ranjan ;
Bhandary, Sulatha V. ;
Acharya, U. Rajendra .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[8]   Explainable framework for Glaucoma diagnosis by image processing and convolutional neural network synergy: Analysis with doctor evaluation [J].
Deperlioglu, Omer ;
Kose, Utku ;
Gupta, Deepak ;
Khanna, Ashish ;
Giampaolo, Fabio ;
Fortino, Giancarlo .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 129 :152-169
[9]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[10]   Semi-supervised classification of fundus images combined with CNN and GCN [J].
Duan, Sixu ;
Huang, Pu ;
Chen, Min ;
Wang, Ting ;
Sun, Xiaolei ;
Chen, Meirong ;
Dong, Xueyuan ;
Jiang, Zekun ;
Li, Dengwang .
JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2022, 23 (12)