DENSELY CONNECTED SWIN-UNET FOR MULTISCALE INFORMATION AGGREGATION IN MEDICAL IMAGE SEGMENTATION

被引：8

作者：

Wang, Ziyang ^{[1
]}

Su, Meiwen ^{[2
]}

Zheng, Jian-Qing ^{[3
]}

Liu, Yang ^{[4
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Oxford, England

[2] Univ Hong Kong, Dept Stat & Actuarial Sci, Hong Kong, Peoples R China

[3] Univ Oxford, Kennedy Inst Rheumatol, Oxford, England

[4] Univ Plymouth, Dept Comp Sci, Plymouth, Devon, England

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP | 2023年

关键词：

Semantic Segmentation; UNet; Vision Transformer;

D O I：

10.1109/ICIP49359.2023.10222451

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image semantic segmentation is a dense prediction task in computer vision that is dominated by deep learning techniques in recent years. UNet, which is a symmetric encoder-decoder end-to-end Convolutional Neural Network (CNN) with skip connections, has shown promising performance. Aiming to process the multiscale feature information efficiently, we propose a new Densely Connected Swin-UNet (DCS-UNet) with multiscale information aggregation for medical image segmentation. Firstly, inspired by Swin-Transformer to model long-range dependencies via shift-window-based self-attention, this work proposes the use of fully ViT-based network blocks with a shift-window approach, resulting in a purely self-attention-based U-shape segmentation network. The relevant layers including feature sampling and image tokenization are re-designed to align with the ViT fashion. Secondly, a full-scale deep supervision scheme is developed to process the aggregated feature map with various resolutions generated by different levels of decoders. Thirdly, dense skip connections are proposed that allow the semantic feature information to be thoroughly transferred from different levels of encoders to lower level decoders. Our proposed method is validated on a public benchmark MRI Cardiac segmentation data set with comprehensive validation metrics showing competitive performance against other variant encoder-decoder networks. The code is available at https://github.com/ziyangwang007/VIT4UNet.

引用

页码：940 / 944

页数：5

共 36 条

[21] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [J].

Liu, Ze ;

Lin, Yutong ;

Cao, Yue ;

Hu, Han ;

Wei, Yixuan ;

Zhang, Zheng ;

Lin, Stephen ;

Guo, Baining .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9992-10002

[22] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation [J].

Milletari, Fausto ;

Navab, Nassir ;

Ahmadi, Seyed-Ahmad .

PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, :565-571

[23]

Oktay O., 2018, INT C MED IM DEEP LE, P1

[24] U2-Net: Going deeper with nested U-structure for salient object detection [J].

Qin, Xuebin ;

Zhang, Zichen ;

Huang, Chenyang ;

Dehghan, Masood ;

Zaiane, Osmar R. ;

Jagersand, Martin .

PATTERN RECOGNITION, 2020, 106

[25] U-Net: Convolutional Networks for Biomedical Image Segmentation [J].

Ronneberger, Olaf ;

Fischer, Philipp ;

Brox, Thomas .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241

[26] Going deeper with Image Transformers [J].

Touvron, Hugo ;

Cord, Matthieu ;

Sablayrolles, Alexandre ;

Synnaeve, Gabriel ;

Jegou, Herve .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :32-42

[27]

Vaswani A, 2017, ADV NEUR IN, V30

[28]

Wang Z, 2022, Adversarial vision transformer for medical image semantic segmentation with limited annotations

[29]

Wang Ziyang, 2021, PROF ICIP

[30]

Wang Ziyang, 2022, P ECCV WORKSH

← 1 2 3 4 →