UNETR plus plus : Delving Into Efficient and Accurate 3D Medical Image Segmentation

被引:21
作者
Shaker, Abdelrahman [1 ]
Maaz, Muhammad [1 ]
Rasheed, Hanoona [1 ]
Khan, Salman [1 ]
Yang, Ming-Hsuan [2 ,3 ,4 ]
Khan, Fahad Shahbaz [5 ,6 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Comp Vis Dept, Abu Dhabi, U Arab Emirates
[2] Univ Calif Merced, Elect Engn & Comp Sci Dept, Merced, CA 95343 USA
[3] Yonsei Univ, Coll Comp, Seoul 03722, South Korea
[4] Google, Mountain View, CA 95344 USA
[5] Mohamed Bin Zayed Univ, Abu Dhabi, U Arab Emirates
[6] Linkoping Univ, Elect Engn Dept, S-58183 Linkoping, Sweden
关键词
Image segmentation; Three-dimensional displays; Transformers; Biomedical imaging; Complexity theory; Graphics processing units; Task analysis; Deep learning; efficient attention; hybrid architecture; medical image segmentation; TRANSFORMER;
D O I
10.1109/TMI.2024.3398728
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies, compared to the local convolutional-based design. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient and has linear complexity with respect to the input. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the complexity. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BraTS, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while significantly reducing parameters and FLOPs by over 71%, compared to the best method in the literature. Our code and models are available at: https://tinyurl.com/2p87x5xn.
引用
收藏
页码:3377 / 3390
页数:14
相关论文
共 47 条
  • [41] CBAM: Convolutional Block Attention Module
    Woo, Sanghyun
    Park, Jongchan
    Lee, Joon-Young
    Kweon, In So
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 3 - 19
  • [42] CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation
    Xie, Yutong
    Zhang, Jianpeng
    Shen, Chunhua
    Xia, Yong
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 : 171 - 180
  • [43] Zhang Y., 2021, LECT NOTES COMPUT SC, P14, DOI [DOI 10.1007/978-3-030-87193-2_2, 10.1007/978-3-030-87193-22]
  • [44] Pyramid Scene Parsing Network
    Zhao, Hengshuang
    Shi, Jianping
    Qi, Xiaojuan
    Wang, Xiaogang
    Jia, Jiaya
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6230 - 6239
  • [45] nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer
    Zhou, Hong-Yu
    Guo, Jiansen
    Zhang, Yinghao
    Han, Xiaoguang
    Yu, Lequan
    Wang, Liansheng
    Yu, Yizhou
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4036 - 4045
  • [46] Zhou Zongwei, 2018, Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018), V11045, P3, DOI [10.1007/978-3-030-00889-5_1, 10.1007/978-3-030-00689-1_1]
  • [47] Zhu QK, 2017, IEEE IJCNN, P178, DOI 10.1109/IJCNN.2017.7965852