LViT: Language Meets Vision Transformer in Medical Image Segmentation

被引:45
|
作者
Li, Zihan [1 ,2 ]
Li, Yunxiang [3 ]
Li, Qingde [4 ]
Wang, Puyang [5 ]
Guo, Dazhou [6 ]
Lu, Le [6 ]
Jin, Dakai [6 ]
Zhang, You [3 ]
Hong, Qingqi [7 ,8 ]
机构
[1] Xiamen Univ, Sch Informat, Xiamen 361005, Peoples R China
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] UT SouthWestern Med Ctr, Dept Radiat Oncol, Dallas, TX 75235 USA
[4] Univ Hull, Sch Comp Sci, Kingston Upon Hull HU6 7RX, England
[5] Alibaba Grp, DAMO Acad, Hangzhou 310024, Peoples R China
[6] Alibaba Grp, TheDAMO Acad, New York, NY 10014 USA
[7] Xiamen Univ, Dept Digital Media Technol, Xiamen 361005, Peoples R China
[8] Hong Kong Ctr Cerebrocardiovasc Hlth Engn COCHE, Hong Kong, Peoples R China
关键词
Biomedical imaging; Image segmentation; Transformers; Convolutional neural networks; Feature extraction; Visualization; Data models; Vision-language; medical image segmentation; semi-supervised learning;
D O I
10.1109/TMI.2023.3291719
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient high-quality labeled data due to the prohibitive data annotation cost. To alleviate this limitation, we propose a new text-augmented medical image segmentation model LViT (Language meets Vision Transformer). In our LViT model, medical text annotation is incorporated to compensate for the quality deficiency in image data. In addition, the text information can guide to generate pseudo labels of improved quality in the semi-supervised learning. We also propose an Exponential Pseudo label Iteration mechanism (EPI) to help the Pixel-Level Attention Module (PLAM) preserve local image features in semi-supervised LViT setting. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. For evaluation, we construct three multimodal medical segmentation datasets (image + text) containing X-rays and CT images. Experimental results show that our proposed LViT has superior segmentation performance in both fully-supervised and semi-supervised setting. The code and datasets are available at https://github.com/HUANGLIZI/LViT.
引用
收藏
页码:96 / 107
页数:12
相关论文
共 50 条
  • [1] Automatic Medical Image Segmentation with Vision Transformer
    Zhang, Jie
    Li, Fan
    Zhang, Xin
    Wang, Huaijun
    Hei, Xinhong
    APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [2] Lightweight vision image transformer (LViT) model for skin cancer disease classification
    Dwivedi, Tanay
    Chaurasia, Brijesh Kumar
    Shukla, Man Mohan
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (10) : 5030 - 5055
  • [3] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
  • [4] Grouped multi-scale vision transformer for medical image segmentation
    Zexuan Ji
    Zheng Chen
    Xiao Ma
    Scientific Reports, 15 (1)
  • [5] MetaSwin: a unified meta vision transformer model for medical image segmentation
    Lee, Soyeon
    Lee, Minhyeok
    PEERJ COMPUTER SCIENCE, 2024, 10 : 1 - 17
  • [6] MetaSwin: a unified meta vision transformer model for medical image segmentation
    Lee, Soyeon
    Lee, Minhyeok
    PeerJ Computer Science, 2024, 10 : 1 - 17
  • [7] Ctnet: rethinking convolutional neural networks and vision transformer for medical image segmentation
    Zhang, Zhixin
    Jiang, Shuhao
    Pan, Xuhua
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (03) : 2265 - 2275
  • [8] Ctnet: rethinking convolutional neural networks and vision transformer for medical image segmentation
    Zhixin Zhang
    Shuhao Jiang
    Xuhua Pan
    Signal, Image and Video Processing, 2024, 18 : 2265 - 2275
  • [9] ViTBIS: Vision Transformer for Biomedical Image Segmentation
    Sagar, Abhinav
    CLINICAL IMAGE-BASED PROCEDURES, DISTRIBUTED AND COLLABORATIVE LEARNING, ARTIFICIAL INTELLIGENCE FOR COMBATING COVID-19 AND SECURE AND PRIVACY-PRESERVING MACHINE LEARNING, CLIP 2021, DCL 2021, LL-COVID19 2021, PPML 2021, 2021, 12969 : 34 - 45
  • [10] MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
    Du, Siyi
    Bayasi, Nourhan
    Hamarneh, Ghassan
    Garbi, Rafeef
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 448 - 458