MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION

被引：248

作者：

Wang, Hongyi ^{[1
]}

Xie, Shiao ^{[1
]}

Lin, Lanfen ^{[1
]}

Iwamoto, Yutaro ^{[2
]}

Han, Xian-Hua ^{[3
]}

Chen, Yen-Wei ^{[1
,3
,4
]}

Tong, Ruofeng ^{[1
,4
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Ritsumeikan Univ, Coll Informat Sci & Engn, Kyoto, Japan

[3] Yamaguchi Univ, Artificial Intelligence Res Ctr, Yamaguchi, Japan

[4] Res Ctr Healthcare Data Sci, Zhejiang Lab, Hangzhou, Peoples R China

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

Medical image segmentation; Deep learning; Vision Transformer; Self-attention;

D O I：

10.1109/ICASSP43922.2022.9746172

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter-and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

引用

页码：2390 / 2394

页数：5

共 20 条

[1]

Cao H., 2021, arXiv preprint arXiv:2105.05537

[2] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[3]

Chang Y ., 2021, ARXIV210705188

[4]

Chen Jieneng, 2021, ARXIV210204306

[5]

Dosovitskiy A, 2020, ARXIV

[6]

Guo MS, 2019, AAAI CONF ARTIF INTE, P6489

[7]

Guo Meng-Hao, 2021, ARXIV210502358, P2021

[8]

Ho Jonathan, 2019, Axial attention in multidimensional transformers, P2

[9]

Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/icassp40776.2020.9053405, 10.1109/ICASSP40776.2020.9053405]

[10]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

← 1 2 →