SMESwin Unet: Merging CNN and Transformer for Medical Image Segmentation

被引：31

作者：

Wang, Ziheng ^{[1
]}

Min, Xiongkuo ^{[2
]}

Shi, Fangyu ^{[2
]}

Jin, Ruinian ^{[1
]}

Nawrin, Saida S. ^{[1
]}

Yu, Ichen ^{[3
]}

Nagatomi, Ryoichi ^{[1
,3
]}

机构：

[1] Tohoku Univ, Grad Sch Biomed Engn, Div Biomed Engn Hlth & Welf, Sendai, Japan

[2] Shanghai Jiao Tong Univ, Inst Image Commun & Network Engn, Shanghai, Peoples R China

[3] Tohoku Univ, Grad Sch Med, Dept Med & Sci Sports & Exercise, Sendai, Japan

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V | 2022年 / 13435卷

关键词：

D O I：

10.1007/978-3-031-16443-9_50

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Vision transformer is the new favorite paradigm in medical image segmentation since last year, which surpassed the traditional CNN counterparts in quantitative metrics. The significant advantage of ViTs is to utilize the attention layers to model global relations between tokens. However, the increased representation capacity of ViTs comes with corresponding shortcomings: short of CNN's inductive biases (locality), translation invariance, and hierarchical structure of visual information. Consequently, well-trained ViTs require more data than CNNs. As high quality data in medical imaging area is always limited, we propose SMESwin UNet. Firstly, based on Channel-wise Cross fusion Transformer (CCT) we fuse multi-scale semantic features and attention maps by designing a compound structure with CNN and ViTs (named MCCT). Secondly, we introduce superpixel by dividing the pixel-level feature into district-level to avoid the interference of meaningless parts of the image. Finally, we used External Attention to consider the correlations among all data samples, which may further reduce the limitation of small datasets. According to our experiments, the proposed superpixel and MCCT-based Swin Unet (SMESwin Unet) achieves better performance than CNNs and other Transformer-based architectures on three medical image segmentation datasets (nucleus, cells, and glands).

引用

页码：517 / 526

页数：10

共 23 条

[1] SLIC Superpixels Compared to State-of-the-Art Superpixel Methods [J].

Achanta, Radhakrishna ;

Shaji, Appu ;

Smith, Kevin ;

Lucchi, Aurelien ;

Fua, Pascal ;

Suesstrunk, Sabine .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (11) :2274-2281

[2]

Cao H., 2021, arXiv

[3]

Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49

[4]

Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]

[5]

Guo MH, 2021, Arxiv, DOI arXiv:2105.02358

[6] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[7]

Huang HM, 2020, Arxiv, DOI arXiv:2004.08790

[8] MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation [J].

Ibtehaz, Nabil ;

Rahman, M. Sohel .

NEURAL NETWORKS, 2020, 121 :74-87

[9] NuClick: A deep learning framework for interactive segmentation of microscopic images [J].

Koohbanani, Navid Alemi ;

Jahanifar, Mostafa ;

Tajadin, Neda Zamani ;

Rajpoot, Nasir .

MEDICAL IMAGE ANALYSIS, 2020, 65

[10] A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology [J].

Kumar, Neeraj ;

Verma, Ruchika ;

Sharma, Sanuj ;

Bhargava, Surabhi ;

Vahadane, Abhishek ;

Sethi, Amit .

IEEE TRANSACTIONS ON MEDICAL IMAGING, 2017, 36 (07) :1550-1560

← 1 2 3 →