LViT: Language Meets Vision Transformer in Medical Image Segmentation

被引：45

作者：

Li, Zihan ^{[1
,2
]}

Li, Yunxiang ^{[3
]}

Li, Qingde ^{[4
]}

Wang, Puyang ^{[5
]}

Guo, Dazhou ^{[6
]}

Lu, Le ^{[6
]}

Jin, Dakai ^{[6
]}

Zhang, You ^{[3
]}

Hong, Qingqi ^{[7
,8
]}

机构：

[1] Xiamen Univ, Sch Informat, Xiamen 361005, Peoples R China

[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[3] UT SouthWestern Med Ctr, Dept Radiat Oncol, Dallas, TX 75235 USA

[4] Univ Hull, Sch Comp Sci, Kingston Upon Hull HU6 7RX, England

[5] Alibaba Grp, DAMO Acad, Hangzhou 310024, Peoples R China

[6] Alibaba Grp, TheDAMO Acad, New York, NY 10014 USA

[7] Xiamen Univ, Dept Digital Media Technol, Xiamen 361005, Peoples R China

[8] Hong Kong Ctr Cerebrocardiovasc Hlth Engn COCHE, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 01期

关键词：

Biomedical imaging; Image segmentation; Transformers; Convolutional neural networks; Feature extraction; Visualization; Data models; Vision-language; medical image segmentation; semi-supervised learning;

D O I：

10.1109/TMI.2023.3291719

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.

引用

页码：96 / 107

页数：12

共 50 条

[1] Automatic Medical Image Segmentation with Vision Transformer
Zhang, Jie
Li, Fan
Zhang, Xin
Wang, Huaijun
Hei, Xinhong
APPLIED SCIENCES-BASEL, 2024, 14 (07):
[2] Lightweight vision image transformer (LViT) model for skin cancer disease classification
Dwivedi, Tanay
Chaurasia, Brijesh Kumar
Shukla, Man Mohan
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 5030 - 5055
[3] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Yang, Zhao
Wang, Jiaqi
Tang, Yansong
Chen, Kai
Zhao, Hengshuang
Torr, Philip H. S.
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
[4] Grouped multi-scale vision transformer for medical image segmentation
Zexuan Ji
Zheng Chen
Xiao Ma
Scientific Reports, 15 (1)
[5] MetaSwin: a unified meta vision transformer model for medical image segmentation
Lee, Soyeon
Lee, Minhyeok
PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 17
[6] MetaSwin: a unified meta vision transformer model for medical image segmentation
Lee, Soyeon
Lee, Minhyeok
PeerJ Computer Science, 2024, 10 : 1 - 17
[7] Ctnet: rethinking convolutional neural networks and vision transformer for medical image segmentation
Zhang, Zhixin
Jiang, Shuhao
Pan, Xuhua
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2265 - 2275
[8] Ctnet: rethinking convolutional neural networks and vision transformer for medical image segmentation
Zhixin Zhang
Shuhao Jiang
Xuhua Pan
Signal, Image and Video Processing, 2024, 18 : 2265 - 2275
[9] ViTBIS: Vision Transformer for Biomedical Image Segmentation
Sagar, Abhinav
CLINICAL IMAGE-BASED PROCEDURES, DISTRIBUTED AND COLLABORATIVE LEARNING, ARTIFICIAL INTELLIGENCE FOR COMBATING COVID-19 AND SECURE AND PRIVACY-PRESERVING MACHINE LEARNING, CLIP 2021, DCL 2021, LL-COVID19 2021, PPML 2021, 2021, 12969 : 34 - 45
[10] MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
Du, Siyi
Bayasi, Nourhan
Hamarneh, Ghassan
Garbi, Rafeef
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 448 - 458

← 1 2 3 4 5 →