MESTrans: Multi-scale embedding spatial transformer for medical image segmentation

被引:15
作者
Liu, Yatong [1 ]
Zhu, Yu [1 ,2 ]
Xin, Ying [3 ]
Zhang, Yanan [3 ]
Yang, Dawei [2 ,4 ]
Xu, Tao [3 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
[2] Shanghai Engn Res Ctr Internet Things Resp Med, Shanghai 200237, Peoples R China
[3] Qingdao Univ, Dept Pulm & Crit Care Med, Affiliated Hosp, Qingdao 266000, Shandong, Peoples R China
[4] Fudan Univ, Zhongshan Hosp, Dept Pulm & Crit Care Med, Shanghai 200032, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer-aided diagnosis; COVID-19; Medical image segmentation; Transformer; ATTENTION; REGIONS;
D O I
10.1016/j.cmpb.2023.107493
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: Transformers profiting from global information modeling derived from the self -attention mechanism have recently achieved remarkable performance in computer vision. In this study, a novel transformer-based medical image segmentation network called the multi-scale embedding spatial transformer (MESTrans) was proposed for medical image segmentation. Methods: First, a dataset called COVID-DS36 was created from 4369 computed tomography (CT) im-ages of 36 patients from a partner hospital, of which 18 had COVID-19 and 18 did not. Subsequently, a novel medical image segmentation network was proposed, which introduced a self-attention mechanism to improve the inherent limitation of convolutional neural networks (CNNs) and was capable of adap-tively extracting discriminative information in both global and local content. Specifically, based on U-Net, a multi-scale embedding block (MEB) and multi-layer spatial attention transformer (SATrans) structure were designed, which can dynamically adjust the receptive field in accordance with the input content. The spatial relationship between multi-level and multi-scale image patches was modeled, and the global context information was captured effectively. To make the network concentrate on the salient feature region, a feature fusion module (FFM) was established, which performed global learning and soft selec-tion between shallow and deep features, adaptively combining the encoder and decoder features. Four datasets comprising CT images, magnetic resonance (MR) images, and H&E-stained slide images were used to assess the performance of the proposed network. Results: Experiments were performed using four different types of medical image datasets. For the COVID-DS36 dataset, our method achieved a Dice similarity coefficient (DSC) of 81.23%. For the GlaS dataset, 89.95% DSC and 82.39% intersection over union (IoU) were obtained. On the Synapse dataset, the average DSC was 77.48% and the average Hausdorff distance (HD) was 31.69 mm. For the I2CVB dataset, 92.3% DSC and 85.8% IoU were obtained. Conclusions: The experimental results demonstrate that the proposed model has an excellent generaliza-tion ability and outperforms other state-of-the-art methods. It is expected to be a potent tool to assist clinicians in auxiliary diagnosis and to promote the development of medical intelligence technology. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 47 条
  • [1] A Deep Learning-Based Approach for the Detection and Localization of Prostate Cancer in T2 Magnetic Resonance Images
    Alkadi, Ruba
    Taher, Fatma
    El-baz, Ayman
    Werghi, Naoufel
    [J]. JOURNAL OF DIGITAL IMAGING, 2019, 32 (05) : 793 - 807
  • [2] [Anonymous], 2001, P 18 INT C MACH LEAR, DOI DOI 10.29122/MIPI.V11I1.2792
  • [3] Cao H., 2022, COMPUTER VISION ECCV, P205
  • [4] Chaitanya K., 2020, Proc. Adv. Neural Inf. Process. Syst, V33, P12546
  • [5] Chen J., 2021, arXiv
  • [6] Chen LC, 2016, Arxiv, DOI [arXiv:1412.7062, 10.48550/arXiv.1412.7062, DOI 10.48550/ARXIV.1412.7062]
  • [7] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [8] Child R, 2019, Arxiv, DOI arXiv:1904.10509
  • [9] Chowdhary K., 2020, FUNDAMENTALS ARTIFIC, P603, DOI [DOI 10.1007/978-81-322-3972-719, 10.1007/978-81-322-3972-7_19]
  • [10] Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49