UNETR plus plus : Delving Into Efficient and Accurate 3D Medical Image Segmentation

被引:21
作者
Shaker, Abdelrahman [1 ]
Maaz, Muhammad [1 ]
Rasheed, Hanoona [1 ]
Khan, Salman [1 ]
Yang, Ming-Hsuan [2 ,3 ,4 ]
Khan, Fahad Shahbaz [5 ,6 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Comp Vis Dept, Abu Dhabi, U Arab Emirates
[2] Univ Calif Merced, Elect Engn & Comp Sci Dept, Merced, CA 95343 USA
[3] Yonsei Univ, Coll Comp, Seoul 03722, South Korea
[4] Google, Mountain View, CA 95344 USA
[5] Mohamed Bin Zayed Univ, Abu Dhabi, U Arab Emirates
[6] Linkoping Univ, Elect Engn Dept, S-58183 Linkoping, Sweden
关键词
Image segmentation; Three-dimensional displays; Transformers; Biomedical imaging; Complexity theory; Graphics processing units; Task analysis; Deep learning; efficient attention; hybrid architecture; medical image segmentation; TRANSFORMER;
D O I
10.1109/TMI.2024.3398728
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies, compared to the local convolutional-based design. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient and has linear complexity with respect to the input. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the complexity. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BraTS, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while significantly reducing parameters and FLOPs by over 71%, compared to the best method in the literature. Our code and models are available at: https://tinyurl.com/2p87x5xn.
引用
收藏
页码:3377 / 3390
页数:14
相关论文
共 47 条
  • [1] Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved?
    Bernard, Olivier
    Lalande, Alain
    Zotti, Clement
    Cervenansky, Frederick
    Yang, Xin
    Heng, Pheng-Ann
    Cetin, Irem
    Lekadir, Karim
    Camara, Oscar
    Gonzalez Ballester, Miguel Angel
    Sanroma, Gerard
    Napel, Sandy
    Petersen, Steffen
    Tziritas, Georgios
    Grinias, Elias
    Khened, Mahendra
    Kollerathu, Varghese Alex
    Krishnamurthi, Ganapathy
    Rohe, Marc-Michel
    Pennec, Xavier
    Sermesant, Maxime
    Isensee, Fabian
    Jaeger, Paul
    Maier-Hein, Klaus H.
    Full, Peter M.
    Wolf, Ivo
    Engelhardt, Sandy
    Baumgartner, Christian F.
    Koch, Lisa M.
    Wolterink, Jelmer M.
    Isgum, Ivana
    Jang, Yeonggul
    Hong, Yoonmi
    Patravali, Jay
    Jain, Shubham
    Humbert, Olivier
    Jodoin, Pierre-Marc
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) : 2514 - 2525
  • [2] Dense-UNet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network
    Cai, Sijing
    Tian, Yunxian
    Lui, Harvey
    Zeng, Haishan
    Wu, Yi
    Chen, Guannan
    [J]. QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2020, 10 (06) : 1275 - 1285
  • [3] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
  • [4] Cardoso M. J., 2022, arXiv, DOI [DOI 10.48550/ARXIV.2211.02701, 10.48550/arXiv.2211.02701]
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Chen J., 2021, arXiv, DOI [DOI 10.48550/ARXIV.2102.04306, 10.48550/arXiv.2102.04306]
  • [7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [8] Child R, 2019, Arxiv, DOI arXiv:1904.10509
  • [9] Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
  • [10] Dosovitskiy A., 2021, ICLR, P22