Uni4Eye++: A General Masked Image Modeling Multi-Modal Pre-Training Framework for Ophthalmic Image Classification and Segmentation

被引:0
|
作者
Cai, Zhiyuan [1 ,2 ]
Lin, Li [1 ,2 ]
He, Huaqing [1 ,2 ]
Cheng, Pujin [1 ,2 ]
Tang, Xiaoying [1 ,2 ]
机构
[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen 518055, Peoples R China
[2] Southern Univ Sci & Technol, Jiaxing Res Inst, Jiaxing 314031, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Training; Medical diagnostic imaging; Image reconstruction; Transformers; Head; Three-dimensional displays; Self-supervised learning; masked image modeling; ophthalmic image analysis; cross-dimension; multi-modal pre-training;
D O I
10.1109/TMI.2024.3422102
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at https://github.com/Davidczy/Uni4Eye++.
引用
收藏
页码:4419 / 4429
页数:11
相关论文
共 13 条
  • [1] MimCo: Masked Image Modeling Pre-training with Contrastive Teacher
    Zhou, Qiang
    Yu, Chaohui
    Luo, Hao
    Wang, Zhibin
    Li, Hao
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4487 - 4495
  • [2] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
    Chen, Zhihong
    Du, Yuhao
    Hu, Jinpeng
    Liu, Yang
    Li, Guanbin
    Wan, Xiang
    Chang, Tsung-Hui
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
  • [3] GMIM: Self-supervised pre-training for 3D medical image segmentation with adaptive and hierarchical masked image modeling
    Qi L.
    Jiang Z.
    Shi W.
    Qu F.
    Feng G.
    Computers in Biology and Medicine, 2024, 176
  • [4] Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis
    Lu, Mengkang
    Wang, Tianyi
    Xia, Yong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 457 - 466
  • [5] Intra-modality masked image modeling: A self-supervised pre-training method for brain tumor segmentation
    Qi, Liangce
    Shi, Weili
    Miao, Yu
    Li, Yonghui
    Feng, Guanyuan
    Jiang, Zhengang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 95
  • [6] MULTI-MODAL SELF-SUPERVISED PRE-TRAINING FOR JOINT OPTIC DISC AND CUP SEGMENTATION IN EYE FUNDUS IMAGES
    Hervella, Alvaro S.
    Ramos, Lucia
    Rouco, Jose
    Novo, Jorge
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 961 - 965
  • [7] DualEye-FeatureNet: A Dual-Stream Feature Transfer Framework for Multi-Modal Ophthalmic Image Classification
    Shafiq, Muhammad
    Fan, Quanrun
    Alghamedy, Fatemah H.
    Obidallah, Waeal J.
    IEEE ACCESS, 2024, 12 : 143985 - 144008
  • [8] An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-training
    Gao, Jin
    Lin, Shubo
    Wang, Shaoru
    Kou, Yutong
    Li, Zeming
    Li, Liang
    Zhang, Congxuan
    Zhang, Xiaoqin
    Wang, Yizheng
    Hu, Weiming
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, : 3918 - 3950
  • [9] CXRMIM: MASKED IMAGE MODELING PRE-TRAINING PARADIGM FOR CHEST X-RAY IMAGES ANALYSIS
    Wang, Zhendong
    Ma, Haowen
    Niu, Jianwei
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2250 - 2254
  • [10] JOINT MULTI-MODAL SELF-SUPERVISED PRE-TRAINING IN REMOTE SENSING: APPLICATION TO METHANE SOURCE CLASSIFICATION
    Berg, Paul
    Pham, Minh-Tan
    Courty, Nicolas
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6624 - 6627