MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION

被引:248
作者
Wang, Hongyi [1 ]
Xie, Shiao [1 ]
Lin, Lanfen [1 ]
Iwamoto, Yutaro [2 ]
Han, Xian-Hua [3 ]
Chen, Yen-Wei [1 ,3 ,4 ]
Tong, Ruofeng [1 ,4 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Ritsumeikan Univ, Coll Informat Sci & Engn, Kyoto, Japan
[3] Yamaguchi Univ, Artificial Intelligence Res Ctr, Yamaguchi, Japan
[4] Res Ctr Healthcare Data Sci, Zhejiang Lab, Hangzhou, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Medical image segmentation; Deep learning; Vision Transformer; Self-attention;
D O I
10.1109/ICASSP43922.2022.9746172
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter-and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.
引用
收藏
页码:2390 / 2394
页数:5
相关论文
共 20 条
[1]  
Cao H., 2021, arXiv preprint arXiv:2105.05537
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chang Y ., 2021, ARXIV210705188
[4]  
Chen Jieneng, 2021, ARXIV210204306
[5]  
Dosovitskiy A, 2020, ARXIV
[6]  
Guo MS, 2019, AAAI CONF ARTIF INTE, P6489
[7]  
Guo Meng-Hao, 2021, ARXIV210502358, P2021
[8]  
Ho Jonathan, 2019, Axial attention in multidimensional transformers, P2
[9]  
Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/icassp40776.2020.9053405, 10.1109/ICASSP40776.2020.9053405]
[10]  
King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001