UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

被引:366
作者
Gao, Yunhe [1 ]
Zhou, Mu [1 ,2 ]
Metaxas, Dimitris N. [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
[2] SenseBrain & Shanghai AI Lab & Ctr Perceptual & I, Shanghai, Peoples R China
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III | 2021年 / 12903卷
关键词
D O I
10.1007/978-3-030-87199-4_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from O(n(2)) to approximate O(n) . A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 28 条
  • [1] Attention Augmented Convolutional Networks
    Bello, Irwan
    Zoph, Barret
    Vaswani, Ashish
    Shlens, Jonathon
    Le, Quoc V.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3285 - 3294
  • [2] Campello V.M., 2020, MULTICENTRE MULTIVEN
  • [3] Dosovitskiy A, 2021, ICLR 2021 9 INT C LE
  • [4] Dual Attention Network for Scene Segmentation
    Fu, Jun
    Liu, Jing
    Tian, Haijie
    Li, Yong
    Bao, Yongjun
    Fang, Zhiwei
    Lu, Hanqing
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3141 - 3149
  • [5] FocusNetv2: Imbalanced large and small organ segmentation with adversarial shape constraint for head and neck CT image
    Gao, Yunhe
    Huang, Rui
    Yang, Yiwei
    Zhang, Jie
    Shao, Kainan
    Tao, Changjuan
    Chen, Yuanyuan
    Metaxas, Dimitris N.
    Li, Hongsheng
    Chen, Ming
    [J]. MEDICAL IMAGE ANALYSIS, 2021, 67
  • [6] Multi-resolution Path CNN with Deep Supervision for Intervertebral Disc Localization and Segmentation
    Gao, Yunhe
    Liu, Chang
    Zhao, Liang
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 309 - 317
  • [7] Identity Mappings in Deep Residual Networks
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 630 - 645
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Huang QY, 2019, I S BIOMED IMAGING, P1622, DOI [10.1109/ISBI.2019.8759423, 10.1109/isbi.2019.8759423]
  • [10] CCNet: Criss-Cross Attention for Semantic Segmentation
    Huang, Zilong
    Wang, Xinggang
    Huang, Lichao
    Huang, Chang
    Wei, Yunchao
    Liu, Wenyu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 603 - 612