Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer

被引：0

作者：

Wang, Qingbin ^{[1
]}

Xiong, Yuxuan ^{[1
]}

Zhu, Hanfeng ^{[2
,3
]}

Mu, Xuefeng ^{[4
]}

Zhang, Yan ^{[4
]}

Ma, Yutao ^{[2
,3
]}

机构：

[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China

[2] Cent China Normal Univ, Sch Comp Sci, Wuhan 430079, Peoples R China

[3] Cent China Normal Univ, Hubei Prov Key Lab Artificial Intelligence & Smart, Wuhan 430079, Peoples R China

[4] Wuhan Univ, Remin Hosp, Dept Obstet & Gynecol, Wuhan 430060, Peoples R China

来源：

COMPUTERIZED MEDICAL IMAGING AND GRAPHICS | 2024年 / 118卷

关键词：

Cervical cancer; Optical coherence tomography; Image classification; Self-supervised learning; Swin Transformer; Interpretability; OPTICAL COHERENCE TOMOGRAPHY;

D O I：

10.1016/j.compmedimag.2024.102469

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Background and Objective: Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images. Methods: In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data. Results: We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross- validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions. Conclusion: Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.

引用

页数：14

共 50 条

[21] Utilizing Swin Transformer for the Classification of Ophthalmic Diseases in Optical Coherence Tomography (OCT) Images: A Novel Approach
Mapanao, Jay Ryan
Luis Lozano, Paulo
2024 6TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND THE INTERNET, ICCCI 2024, 2024, : 94 - 100
[22] Swin-MSP: A Shifted Windows Masked Spectral Pretraining Model for Hyperspectral Image Classification
Tian, Rui
Liu, Danqing
Bai, Yu
Jin, Yu
Wan, Guanliang
Guo, Yanhui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[23] Transformer-based unsupervised contrastive learning for histopathological image classification
Wang, Xiyue
Yang, Sen
Zhang, Jun
Wang, Minghui
Zhang, Jing
Yang, Wei
Huang, Junzhou
Han, Xiao
MEDICAL IMAGE ANALYSIS, 2022, 81
[24] CRAT: Advanced transformer-based deep learning algorithms in OCT image classification
Yang, Mingming
Du, Junhui
Lv, Ruichan
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
[25] Swin transformer with multiscale 3D atrous convolution for hyperspectral image classification
Farooque, Ghulam
Liu, Qichao
Sargano, Allah Bux
Xiao, Liang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[26] Image recoloring for color vision deficiency compensation using Swin transformer
Chen, Ligeng
Zhu, Zhenyang
Huang, Wangkang
Go, Kentaro
Chen, Xiaodiao
Mao, Xiaoyang
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (11) : 6051 - 6066
[27] Efficient Lung Cancer Image Classification and Segmentation Algorithm Based on an Improved Swin Transformer
Sun, Ruina
Pang, Yuexin
Li, Wenfa
ELECTRONICS, 2023, 12 (04)
[28] Image recoloring for color vision deficiency compensation using Swin transformer
Ligeng Chen
Zhenyang Zhu
Wangkang Huang
Kentaro Go
Xiaodiao Chen
Xiaoyang Mao
Neural Computing and Applications, 2024, 36 : 6051 - 6066
[29] Image denoising using channel attention residual enhanced Swin Transformer
Qiang Dai
Xi Cheng
Li Zhang
Multimedia Tools and Applications, 2024, 83 : 19041 - 19059
[30] Image denoising using channel attention residual enhanced Swin Transformer
Dai, Qiang
Cheng, Xi
Zhang, Li
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (07) : 19041 - 19059

← 1 2 3 4 5 →