UNETR plus plus : Delving Into Efficient and Accurate 3D Medical Image Segmentation

被引：21

作者：

Shaker, Abdelrahman ^{[1
]}

Maaz, Muhammad ^{[1
]}

Rasheed, Hanoona ^{[1
]}

Khan, Salman ^{[1
]}

Yang, Ming-Hsuan ^{[2
,3
,4
]}

Khan, Fahad Shahbaz ^{[5
,6
]}

机构：

[1] Mohamed Bin Zayed Univ Artificial Intelligence, Comp Vis Dept, Abu Dhabi, U Arab Emirates

[2] Univ Calif Merced, Elect Engn & Comp Sci Dept, Merced, CA 95343 USA

[3] Yonsei Univ, Coll Comp, Seoul 03722, South Korea

[4] Google, Mountain View, CA 95344 USA

[5] Mohamed Bin Zayed Univ, Abu Dhabi, U Arab Emirates

[6] Linkoping Univ, Elect Engn Dept, S-58183 Linkoping, Sweden

来源：

IEEE TRANSACTIONS ON MEDICAL IMAGING | 2024年 / 43卷 / 09期

关键词：

Image segmentation; Three-dimensional displays; Transformers; Biomedical imaging; Complexity theory; Graphics processing units; Task analysis; Deep learning; efficient attention; hybrid architecture; medical image segmentation; TRANSFORMER;

D O I：

10.1109/TMI.2024.3398728

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies, compared to the local convolutional-based design. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient and has linear complexity with respect to the input. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the complexity. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BraTS, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while significantly reducing parameters and FLOPs by over 71%, compared to the best method in the literature. Our code and models are available at: https://tinyurl.com/2p87x5xn.

引用

页码：3377 / 3390

页数：14

共 47 条

[1] Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved?
Bernard, Olivier
Lalande, Alain
Zotti, Clement
Cervenansky, Frederick
Yang, Xin
Heng, Pheng-Ann
Cetin, Irem
Lekadir, Karim
Camara, Oscar
Gonzalez Ballester, Miguel Angel
Sanroma, Gerard
Napel, Sandy
Petersen, Steffen
Tziritas, Georgios
Grinias, Elias
Khened, Mahendra
Kollerathu, Varghese Alex
Krishnamurthi, Ganapathy
Rohe, Marc-Michel
Pennec, Xavier
Sermesant, Maxime
Isensee, Fabian
Jaeger, Paul
Maier-Hein, Klaus H.
Full, Peter M.
Wolf, Ivo
Engelhardt, Sandy
Baumgartner, Christian F.
Koch, Lisa M.
Wolterink, Jelmer M.
Isgum, Ivana
Jang, Yeonggul
Hong, Yoonmi
Patravali, Jay
Jain, Shubham
Humbert, Olivier
Jodoin, Pierre-Marc
[J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) : 2514 - 2525
[2] Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network
Cai, Sijing
Tian, Yunxian
Lui, Harvey
Zeng, Haishan
Wu, Yi
Chen, Guannan
[J]. QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2020, 10 (06) : 1275 - 1285
[3] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[4] Cardoso M. J., 2022, arXiv, DOI [DOI 10.48550/ARXIV.2211.02701, 10.48550/arXiv.2211.02701]
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] Chen J., 2021, arXiv, DOI [DOI 10.48550/ARXIV.2102.04306, 10.48550/arXiv.2102.04306]
[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[8] Child R, 2019, Arxiv, DOI arXiv:1904.10509
[9] Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
[10] Dosovitskiy A., 2021, ICLR, P22

← 1 2 3 4 5 →