Uni4Eye++: A General Masked Image Modeling Multi-Modal Pre-Training Framework for Ophthalmic Image Classification and Segmentation

被引：2

作者：

Cai, Zhiyuan ^{[1
,2
]}

Lin, Li ^{[1
,2
]}

He, Huaqing ^{[1
,2
]}

Cheng, Pujin ^{[1
,2
]}

Tang, Xiaoying ^{[1
,2
]}

机构：

[1] Southern Univ Sci & Technol, Dept Elect & Elect Engn, Shenzhen 518055, Peoples R China

[2] Southern Univ Sci & Technol, Jiaxing Res Inst, Jiaxing 314031, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Training; Medical diagnostic imaging; Image reconstruction; Transformers; Head; Three-dimensional displays; Self-supervised learning; masked image modeling; ophthalmic image analysis; cross-dimension; multi-modal pre-training;

D O I：

10.1109/TMI.2024.3422102

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at https://github.com/Davidczy/Uni4Eye++.

引用

页码：4419 / 4429

页数：11

共 61 条

[1]

Bao H., Dong L., Piao S., Wei F., BEiT: BERT pre-training of image transformers, (2021)

[2]

Bardes A., Ponce J., LeCun Y., VICReg: Variance-invariancecovariance regularization for self-supervised learning, (2021)

[3]

Cai Z., Lin L., He H., Tang X., Uni4Eye: Unified 2D and 3D self-supervised pre-training via masked image modeling transformer for ophthalmic image classification, Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent, pp. 88-98, (2022)

[4]

Caron M., Et al., Emerging properties in self-supervised vision transformers, Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 9650-9660, (2021)

[5]

Chaitanya K., Erdil E., Karani N., Konukoglu E., Contrastive learning of global and local features for medical image segmentation with limited annotations, Proc. Adv. Neural Inf. Process. Syst., 33, pp. 12546-12558, (2020)

[6]

Chen L., Bentley P., Mori K., Misawa K., Fujiwara M., Rueckert D., Self-supervised learning for medical image analysis using image context restoration, Med. Image Anal., 58, (2019)

[7]

Chen M., Et al., Generative pretraining from pixels, Proc. Int. Conf. Mach. Learn., pp. 1691-1703, (2020)

[8]

Chen T., Kornblith S., Norouzi M., Hinton G.E., A simple framework for contrastive learning of visual representations, Proc. 37th Int. Conf. Mach. Learn., 119, pp. 1597-1607, (2020)

[9]

Chen X., Xie S., He K., An empirical study of training selfsupervised vision transformers, Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 9640-9649, (2021)

[10]

Chen Z., Et al., Masked image modeling advances 3D medical image analysis, (2022)

← 1 2 3 4 5 6 7 →