MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data

被引：1

作者：

Chen, Haoyuan ^{[1
]}

Han, Yufei ^{[1
]}

Li, Yanyi ^{[1
]}

Xu, Pin ^{[1
]}

Li, Kuan ^{[1
]}

Yin, Jianping ^{[1
]}

机构：

[1] Dongguan Univ Technol, Dongguan, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII | 2024年 / 14437卷

关键词：

Medical Image Segmentation; U-Net; Swin Transformer; Multi-scale Nested Decoder;

D O I：

10.1007/978-981-99-8558-6_39

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The new framework is proposed based on the observation that the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Overall, the proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. Experiment results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. The code is publicly available at: https:// github.com/HH446/MS- UNet.

引用

页码：472 / 483

页数：12

共 21 条

[1] Cao H, 2021, arXiv
[2] Carion N, 2020, Arxiv, DOI arXiv:2005.12872
[3] Chen J., 2021, arXiv preprint: arXiv:2102.04306
[4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[6] Fu SH, 2020, Arxiv, DOI arXiv:2005.09120
[7] Masked Autoencoders Are Scalable Vision Learners
He, Kaiming
Chen, Xinlei
Xie, Saining
Li, Yanghao
Dollar, Piotr
Girshick, Ross
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15979 - 15988
[8] He KM, 2020, Arxiv, DOI [arXiv:1911.05722, DOI 10.48550/ARXIV.1911.05722]
[9] Huang HM, 2020, Arxiv, DOI arXiv:2004.08790
[10] DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation
Jha, Debesh
Riegler, Michael A.
Johansen, Dag
Halvorsen, Pal
Johansen, Havard D.
[J]. 2020 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS(CBMS 2020), 2020, : 558 - 564

← 1 2 3 →