Multi-scale Hierarchical Vision Transformer with Cascaded Attention Decoding for Medical Image Segmentation

被引:0
作者
Rahman, Md Mostafijur [1 ]
Marculescu, Radu [1 ]
机构
[1] Univ Texas Austin, Dept ECE, Syst Level Design Grp, Austin, TX 78712 USA
来源
MEDICAL IMAGING WITH DEEP LEARNING, VOL 227 | 2023年 / 227卷
关键词
Medical image segmentation; Vision transformer; Multi-scale transformer; Feature-mixing augmentation; Self-attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have shown great success in medical image segmentation. However, transformers may exhibit a limited generalization ability due to the underlying single-scale selfattention (SA) mechanism. In this paper, we address this issue by introducing a Multiscale hiERarchical vIsion Transformer (MERIT) backbone network, which improves the generalizability of the model by computing SA at multiple scales. We also incorporate an attention-based decoder, namely Cascaded Attention Decoding (CASCADE), for further refinement of the multi-stage features generated by MERIT. Finally, we introduce an effective multi-stage feature mixing loss aggregation (MUTATION) method for better model training via implicit ensembling. Our experiments on two widely used medical image segmentation benchmarks (i.e., Synapse Multi-organ and ACDC) demonstrate the superior performance of MERIT over state-of-the-art methods. Our MERIT architecture and MUTATION loss aggregation can be used with other downstream medical image and semantic segmentation tasks.
引用
收藏
页码:1526 / 1544
页数:19
相关论文
共 29 条
[1]  
Cao H., 2021, arXiv, DOI DOI 10.48550/ARXIV.2105.05537
[2]  
Chen J., 2021, arXiv, DOI DOI 10.48550/ARXIV.2102.04306
[3]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306
[4]   Reverse Attention for Salient Object Detection [J].
Chen, Shuhan ;
Tan, Xiuli ;
Wang, Ben ;
Hu, Xuelong .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :236-252
[5]  
Chu XX, 2021, Arxiv, DOI arXiv:2102.10882
[6]  
Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26
[7]  
Dong B, 2024, Arxiv, DOI [arXiv:2108.06932, 10.26599/AIR.2023.9150015, DOI 10.48550/ARXIV.2108.06932]
[8]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[9]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[10]  
Huang HM, 2020, INT CONF ACOUST SPEE, P1055, DOI [10.1109/icassp40776.2020.9053405, 10.1109/ICASSP40776.2020.9053405]