UNETR plus plus : Delving Into Efficient and Accurate 3D Medical Image Segmentation

被引：21

作者：

Shaker, Abdelrahman ^{[1
]}

Maaz, Muhammad ^{[1
]}

Rasheed, Hanoona ^{[1
]}

Khan, Salman ^{[1
]}

Yang, Ming-Hsuan ^{[2
,3
,4
]}

Khan, Fahad Shahbaz ^{[5
,6
]}

机构：

[1] Mohamed Bin Zayed Univ Artificial Intelligence, Comp Vis Dept, Abu Dhabi, U Arab Emirates

[2] Univ Calif Merced, Elect Engn & Comp Sci Dept, Merced, CA 95343 USA

[3] Yonsei Univ, Coll Comp, Seoul 03722, South Korea

[4] Google, Mountain View, CA 95344 USA

[5] Mohamed Bin Zayed Univ, Abu Dhabi, U Arab Emirates

[6] Linkoping Univ, Elect Engn Dept, S-58183 Linkoping, Sweden

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 09期

关键词：

Image segmentation; Three-dimensional displays; Transformers; Biomedical imaging; Complexity theory; Graphics processing units; Task analysis; Deep learning; efficient attention; hybrid architecture; medical image segmentation; TRANSFORMER;

D O I：

10.1109/TMI.2024.3398728

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies, compared to the local convolutional-based design. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient and has linear complexity with respect to the input. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the complexity. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BraTS, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while significantly reducing parameters and FLOPs by over 71%, compared to the best method in the literature. Our code and models are available at: https://tinyurl.com/2p87x5xn.

引用

页码：3377 / 3390

页数：14

共 47 条

[41] CBAM: Convolutional Block Attention Module
Woo, Sanghyun
Park, Jongchan
Lee, Joon-Young
Kweon, In So
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 3 - 19
[42] CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation
Xie, Yutong
Zhang, Jianpeng
Shen, Chunhua
Xia, Yong
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III, 2021, 12903 : 171 - 180
[43] Zhang Y., 2021, LECT NOTES COMPUT SC, P14, DOI [DOI 10.1007/978-3-030-87193-2_2, 10.1007/978-3-030-87193-22]
[44] Pyramid Scene Parsing Network
Zhao, Hengshuang
Shi, Jianping
Qi, Xiaojuan
Wang, Xiaogang
Jia, Jiaya
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6230 - 6239
[45] nnFormer: Volumetric Medical Image Segmentation via a 3D Transformer
Zhou, Hong-Yu
Guo, Jiansen
Zhang, Yinghao
Han, Xiaoguang
Yu, Lequan
Wang, Liansheng
Yu, Yizhou
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4036 - 4045
[46] Zhou Zongwei, 2018, Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018), V11045, P3, DOI [10.1007/978-3-030-00889-5_1, 10.1007/978-3-030-00689-1_1]
[47] Zhu QK, 2017, IEEE IJCNN, P178, DOI 10.1109/IJCNN.2017.7965852

← 1 2 3 4 5 →