MISSU: 3D Medical Image Segmentation via Self-Distilling TransUNet

被引:34
作者
Wang, Nan [1 ]
Lin, Shaohui [1 ]
Li, Xiaoxiao [2 ]
Li, Ke [3 ]
Shen, Yunhang [3 ]
Gao, Yue [4 ]
Ma, Lizhuang [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200062, Peoples R China
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver V6T 1Z4, BC, Canada
[3] Tencent, Youtu Lab, Shanghai 200123, Peoples R China
[4] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Transformers; Three-dimensional displays; Semantics; Feature extraction; Computational modeling; Training; Self-distillation; transformer; medical image segmentation; 3D convolutional neural networks; NETWORK;
D O I
10.1109/TMI.2023.3264433
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
U-Nets have achieved tremendous success in medical image segmentation. Nevertheless, it may have limitations in global (long-range) contextual interactions and edge-detail preservation. In contrast, the Transformer module has an excellent ability to capture long-range dependencies by leveraging the self-attention mechanism into the encoder. Although the Transformer module was born to model the long-range dependency on the extracted feature maps, it still suffers high computational and spatial complexities in processing high-resolution 3D feature maps. This motivates us to design an efficient Transformer-based UNet model and study the feasibility of Transformer-based network architectures for medical image segmentation tasks. To this end, we propose to self-distill a Transformer-based UNet for medical image segmentation, which simultaneously learns global semantic information and local spatial-detailed features. Meanwhile, a local multi-scale fusion block is first proposed to refine fine-grained details from the skipped connections in the encoder by the main CNN stem through self-distillation, only computed during training and removed at inference with minimal overhead. Extensive experiments on BraTS 2019 and CHAOS datasets show that our MISSU achieves the best performance over previous state-of-the-art methods. Code and models are available at: https://github.com/wangn123/MISSU.git
引用
收藏
页码:2740 / 2750
页数:11
相关论文
共 50 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]  
Bakas S, 2019, Arxiv, DOI arXiv:1811.02629
[3]  
Chandrakar Mukesh Kumar, 2020, International Journal of Computer Vision and Image Processing, V10, P43, DOI [10.4018/ijcvip.2020100103, 10.4018/IJCVIP.2020100103]
[4]  
Chen J., 2021, arXiv, DOI DOI 10.48550/ARXIV.2102.04306
[5]   Self-supervised learning for medical image analysis using image context restoration [J].
Chen, Liang ;
Bentley, Paul ;
Mori, Kensaku ;
Misawa, Kazunari ;
Fujiwara, Michitaka ;
Rueckert, Daniel .
MEDICAL IMAGE ANALYSIS, 2019, 58
[6]   DRINet for Medical Image Segmentation [J].
Chen, Liang ;
Bentley, Paul ;
Mori, Kensaku ;
Misawa, Kazunari ;
Fujiwara, Michitaka ;
Rueckert, Daniel .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (11) :2453-2462
[7]  
Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[8]  
Cicek Ozgun, 2016, Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016. 19th International Conference. Proceedings: LNCS 9901, P424, DOI 10.1007/978-3-319-46723-8_49
[9]   HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation [J].
Dolz, Jose ;
Gopinath, Karthik ;
Yuan, Jing ;
Lombaert, Herve ;
Desrosiers, Christian ;
Ben Ayed, Ismail .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (05) :1116-1126
[10]  
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021