Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation

被引：0

作者：

Rahman, Md Mostafijur ^{[1
]}

Marculescu, Radu ^{[1
]}

机构：

[1] Univ Texas Austin, Dept ECE, Syst Level Design Grp, Austin, TX 78712 USA

来源：

MEDICAL IMAGING WITH DEEP LEARNING, VOL 227 | 2023年 / 227卷

关键词：

Medical image segmentation; Vision transformer; Multi-scale transformer; Feature-mixing augmentation; Self-attention;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have shown great success in medical image segmentation. However, transformers may exhibit a limited generalization ability due to the underlying single-scale selfattention (SA) mechanism. In this paper, we address this issue by introducing a Multiscale hiERarchical vIsion Transformer (MERIT) backbone network, which improves the generalizability of the model by computing SA at multiple scales. We also incorporate an attention-based decoder, namely Cascaded Attention Decoding (CASCADE), for further refinement of the multi-stage features generated by MERIT. Finally, we introduce an effective multi-stage feature mixing loss aggregation (MUTATION) method for better model training via implicit ensembling. Our experiments on two widely used medical image segmentation benchmarks (i.e., Synapse Multi-organ and ACDC) demonstrate the superior performance of MERIT over state-of-the-art methods. Our MERIT architecture and MUTATION loss aggregation can be used with other downstream medical image and semantic segmentation tasks.

引用

页码：1526 / 1544

页数：19

共 29 条

[1]

Cao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2105.05537

[2]

Chen J., 2021, arXiv, DOI DOI 10.48550/ARXIV.2102.04306

[3] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

[4] Reverse Attention for Salient Object Detection [J].

Chen, Shuhan ;

Tan, Xiuli ;

Wang, Ben ;

Hu, Xuelong .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252

[5]

Chu XX, 2021, Arxiv, DOI arXiv:2102.10882

[6]

Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26

[7]

Dong B, 2024, Arxiv, DOI [arXiv:2108.06932, 10.26599/AIR.2023.9150015, DOI 10.48550/ARXIV.2108.06932]

[8]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[9]

Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]

[10]

Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/icassp40776.2020.9053405, 10.1109/ICASSP40776.2020.9053405]

← 1 2 3 →