DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation

被引:588
作者
Lin, Ailiang [1 ]
Chen, Bingzhi [1 ]
Xu, Jiayu [1 ]
Zhang, Zheng [1 ]
Lu, Guangming [1 ]
Zhang, David [2 ]
机构
[1] Harbin Inst Technol, Shenzhen Med Biometr Percept & Anal Engn Lab, Shenzhen 518055, Peoples R China
[2] Chinese Univ Hong Kong, Sch Sci & Engn, Shenzhen 518055, Peoples R China
关键词
Transformers; Image segmentation; Semantics; Decoding; Computer architecture; Task analysis; Medical diagnostic imaging; Hierarchical swin transformer; long-range contextual information; medical image segmentation; transformer interactive fusion~(TIF) module;
D O I
10.1109/TIM.2022.3178991
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic medical image segmentation has made great progress owing to powerful deep representation learning. Inspired by the success of self-attention mechanism in transformer, considerable efforts are devoted to designing the robust variants of the encoder-decoder architecture with transformer. However, the patch division used in the existing transformer-based models usually ignores the pixel-level intrinsic structural features inside each patch. In this article, we propose a novel deep medical image segmentation framework called dual swin transformer U-Net (DS-TransUNet), which aims to incorporate the hierarchical swin transformer into both the encoder and the decoder of the standard U-shaped architecture. Our DS-TransUNet benefits from the self-attention computation in swin transformer and the designed dual-scale encoding, which can effectively model the non-local dependencies and multiscale contexts for enhancing the semantic segmentation quality of varying medical images. Unlike many prior transformer-based solutions, the proposed DS-TransUNet adopts a well-established dual-scale encoding mechanism that uses dual-scale encoders based on swin transformer to extract the coarse and fine-grained feature representations of different semantic scales. Meanwhile, a well-designed transformer interactive fusion (TIF) module is proposed to effectively perform multiscale information fusion through the self-attention mechanism. Furthermore, we introduce the swin transformer block into the decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and our approach significantly outperforms the state-of-the-art methods.
引用
收藏
页数:15
相关论文
共 48 条
[1]  
Alom M. Z., 2018, CoRR
[2]   Bi-Directional ConvLSTM U-Net with Densley Connected Convolutions [J].
Azad, Reza ;
Asadi-Aghbolaghi, Maryam ;
Fathy, Mahmood ;
Escalera, Sergio .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :406-415
[3]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[4]   WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J].
Bernal, Jorge ;
Javier Sanchez, F. ;
Fernandez-Esparrach, Gloria ;
Gil, Debora ;
Rodriguez, Cristina ;
Vilarino, Fernando .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 43 :99-111
[5]   Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl [J].
Caicedo, Juan C. ;
Goodman, Allen ;
Karhohs, Kyle W. ;
Cimini, Beth A. ;
Ackerman, Jeanelle ;
Haghighi, Marzieh ;
Heng, CherKeng ;
Becker, Tim ;
Minh Doan ;
McQuin, Claire ;
Rohban, Mohammad ;
Singh, Shantanu ;
Carpenter, Anne E. .
NATURE METHODS, 2019, 16 (12) :1247-+
[6]  
Cao H., 2021, arXiv
[7]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[8]  
Chen C.-F.R., 2021, P IEEECVF INT C COMP, P357
[9]  
Chen J., ARXIV210204306, V2021
[10]  
Codella Noel, 2019, arXiv