UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation

被引:426
作者
Gao, Yunhe [1 ]
Zhou, Mu [1 ,2 ]
Metaxas, Dimitris N. [1 ]
机构
[1] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
[2] SenseBrain & Shanghai AI Lab & Ctr Perceptual & I, Shanghai, Peoples R China
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT III | 2021年 / 12903卷
关键词
D O I
10.1007/978-3-030-87199-4_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer architecture has emerged to be successful in a number of natural language processing tasks. However, its applications to medical vision remain largely unexplored. In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation. UTNet applies self-attention modules in both encoder and decoder for capturing long-range dependency at different scales with minimal overhead. To this end, we propose an efficient self-attention mechanism along with relative position encoding that reduces the complexity of self-attention operation significantly from O(n(2)) to approximate O(n) . A new self-attention decoder is also proposed to recover fine-grained details from the skipped connections in the encoder. Our approach addresses the dilemma that Transformer requires huge amounts of data to learn vision inductive bias. Our hybrid layer design allows the initialization of Transformer into convolutional networks without a need of pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac magnetic resonance imaging cohort. UTNet demonstrates superior segmentation performance and robustness against the state-of-the-art approaches, holding the promise to generalize well on other medical image segmentations.
引用
收藏
页码:61 / 71
页数:11
相关论文
共 28 条
[1]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[2]  
Campello V.M., 2020, MULTICENTRE MULTIVEN
[3]  
Dosovitskiy A, 2021, ICLR 2021 9 INT C LE
[4]   Dual Attention Network for Scene Segmentation [J].
Fu, Jun ;
Liu, Jing ;
Tian, Haijie ;
Li, Yong ;
Bao, Yongjun ;
Fang, Zhiwei ;
Lu, Hanqing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3141-3149
[5]   FocusNetv2: Imbalanced large and small organ segmentation with adversarial shape constraint for head and neck CT image [J].
Gao, Yunhe ;
Huang, Rui ;
Yang, Yiwei ;
Zhang, Jie ;
Shao, Kainan ;
Tao, Changjuan ;
Chen, Yuanyuan ;
Metaxas, Dimitris N. ;
Li, Hongsheng ;
Chen, Ming .
MEDICAL IMAGE ANALYSIS, 2021, 67
[6]   Multi-resolution Path CNN with Deep Supervision for Intervertebral Disc Localization and Segmentation [J].
Gao, Yunhe ;
Liu, Chang ;
Zhao, Liang .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 :309-317
[7]   Identity Mappings in Deep Residual Networks [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :630-645
[8]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[9]  
Huang QY, 2019, I S BIOMED IMAGING, P1622, DOI [10.1109/ISBI.2019.8759423, 10.1109/isbi.2019.8759423]
[10]   CCNet: Criss-Cross Attention for Semantic Segmentation [J].
Huang, Zilong ;
Wang, Xinggang ;
Huang, Lichao ;
Huang, Chang ;
Wei, Yunchao ;
Liu, Wenyu .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :603-612