MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data

被引:1
作者
Chen, Haoyuan [1 ]
Han, Yufei [1 ]
Li, Yanyi [1 ]
Xu, Pin [1 ]
Li, Kuan [1 ]
Yin, Jianping [1 ]
机构
[1] Dongguan Univ Technol, Dongguan, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII | 2024年 / 14437卷
关键词
Medical Image Segmentation; U-Net; Swin Transformer; Multi-scale Nested Decoder;
D O I
10.1007/978-981-99-8558-6_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The new framework is proposed based on the observation that the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Overall, the proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. Experiment results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. The code is publicly available at: https:// github.com/HH446/MS- UNet.
引用
收藏
页码:472 / 483
页数:12
相关论文
共 21 条
  • [1] Cao H, 2021, arXiv
  • [2] Carion N, 2020, Arxiv, DOI arXiv:2005.12872
  • [3] Chen J., 2021, arXiv preprint: arXiv:2102.04306
  • [4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [5] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [6] Fu SH, 2020, Arxiv, DOI arXiv:2005.09120
  • [7] Masked Autoencoders Are Scalable Vision Learners
    He, Kaiming
    Chen, Xinlei
    Xie, Saining
    Li, Yanghao
    Dollar, Piotr
    Girshick, Ross
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
  • [8] He KM, 2020, Arxiv, DOI [arXiv:1911.05722, DOI 10.48550/ARXIV.1911.05722]
  • [9] Huang HM, 2020, Arxiv, DOI arXiv:2004.08790
  • [10] DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation
    Jha, Debesh
    Riegler, Michael A.
    Johansen, Dag
    Halvorsen, Pal
    Johansen, Havard D.
    [J]. 2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 558 - 564