Multiscale transunet plus plus : dense hybrid U-Net with transformer for medical image segmentation

被引:20
作者
Wang, Bo [1 ]
Wang, Fan [1 ]
Dong, Pengwei [1 ]
Li, Chongyi [2 ]
机构
[1] Ningxia Univ, Sch Phys & Elect Elect Engn, Yinchuan 750021, Ningxia, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
中国国家自然科学基金;
关键词
Medical image segmentation; CNN; Transformer; Weighted loss function; ATTENTION; NETWORKS;
D O I
10.1007/s11760-021-02115-w
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Automatic medical image segmentation as assistance to doctors is important for diagnosis and treatment of various diseases. TransUNet that integrates the advantages of transformer and CNN has achieved success in medical image segmentation tasks. However, TransUNet simply combines feature maps between encoder and decoder via skip connections at the same resolution, which leads to be an unnecessarily restrictive fusion design. Moreover, the positional encoding and input tokens in standard transformer blocks of TransUNet have a fixed scale, which are not suitable for dense prediction. To alleviate the above problems, in this paper, we propose a novel architecture named multiscale TransUNet + + (MS-TransUNet + +), which employs a multiscale and flexible feature fusion scheme between encoder and decoder at different levels. The novel skip connections densely bridge the extracted feature representations with different resolutions, and the hybrid CNN-Transformer encoder with long-range dependencies directly passes the high-level features to each stage of decoder. Besides, in order to obtain more effective feature representations, an efficient multi-scale visual transformer is introduced for feature encoder. More importantly, we employ a weighted loss function composed of focal, multiscale structure similarity and Jaccard index to penalize the training error of medical image segmentation, jointly realizing pixel-level, patch-level and map-level optimization. Extensive experimental results demonstrate that our proposed multiscale TransUNet + + can achieve competitive performance for prostate MR and liver CT image segmentation.
引用
收藏
页码:1607 / 1614
页数:8
相关论文
共 37 条
  • [1] Bilic P., 2019, ARXIV PREPRINT ARXIV
  • [2] Chen J, 2021, ARXIV, P1, DOI DOI 10.48550/ARXIV.2102.04306
  • [3] Chen Li, 2020, 2020 IEEE International Conference on Image Processing (ICIP), P345, DOI 10.1109/ICIP40778.2020.9190761
  • [4] Cicek O, 2016, INT C MED IM COMP CO, P424, DOI [DOI 10.1007/978-3-319-46723-8_49, DOI 10.1007/978]
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] Dosovitskiy A., 2020, INT C LEARN REPR
  • [7] 3-D Active Contour Segmentation Based on Sparse Linear Combination of Training Shapes (SCoTS)
    Farhangi, M. Mehdi
    Frigui, Hichem
    Seow, Albert
    Amini, Amir A.
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2017, 36 (11) : 2239 - 2249
  • [8] Generalized Spoofing Detection Inspired from Audio Generation Artifacts
    Gao, Yang
    Vuong, Tyler
    Elyasi, Mahsa
    Bharaj, Gaurav
    Singh, Rita
    [J]. INTERSPEECH 2021, 2021, : 4184 - 4188
  • [9] CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation
    Gu, Ran
    Wang, Guotai
    Song, Tao
    Huang, Rui
    Aertsen, Michael
    Deprest, Jan
    Ourselin, Sebastien
    Vercauteren, Tom
    Zhang, Shaoting
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (02) : 699 - 711
  • [10] Hatamizadeh A, 2021, ARXIV PREPRINT ARXIV