DS-Former: A dual-stream encoding-based transformer for 3D medical image segmentation

被引:0
作者
Zhang, Lei [1 ]
Zuo, Yi [1 ]
Jia, Yu [2 ]
Li, Dongze [3 ]
Zeng, Rui [4 ]
Li, Dong [1 ,5 ]
Chen, Junren [1 ]
Wang, Wei [6 ]
机构
[1] Sichuan Univ, Comp Sci, 24 South Sect 1,Yihuan Rd, Chengdu 610065, Sichuan, Peoples R China
[2] Sichuan Univ, West China Hosp, Int Med Ctr Ward, Gen Practice Med Ctr,Gen Practice Ward, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China
[3] Sichuan Univ, West China Hosp, West China Sch Med, Dept Emergency Med,Disaster Med Ctr, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China
[4] Sichuan Univ, West China Hosp, West China Sch Med, Dept Cardiol, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China
[5] Qinghai Univ, Dept Comp Sci, 251 Ningda Rd, Xining 810016, Qinghai, Peoples R China
[6] Chengdu Univ Informat Technol, Sch Automat, 24 Block 1,Xuefu Rd, Chengdu 610225, Sichuan, Peoples R China
关键词
Dual-stream module; Transformer; Convolution; 3D medical image segmentation; ATTENTION;
D O I
10.1016/j.bspc.2023.105702
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Models that utilize self-attention mechanisms, including but not limited to Vision Transformers (ViTs), have shown promising performance in visual tasks like semantic segmentation. This is attributed to their capacity to capture global features of images, enabling them to learn more comprehensive representations. However, transformer-based models typically demand a considerable amount of training data to achieve satisfactory performance, while being deficient in the ability to efficiently extract local image features. As a result, these models may not be as effective in some computer vision tasks that involve small-scale datasets, like medical image segmentation. To address these issues, this paper proposes a dual-stream encoding-based transformer dubbed as Dual-stream Transformer (DS-Former). The dual-stream module in DS-Former can simultaneously acquire local and global features in the image and construct relation between the two kinds of features via self-attention. Compared with the simple splicing or serial connection, the dual-stream module can extract more comprehensive and hierarchical feature information from the fusion interaction of the two features. Our method is evaluated on the UK Biobank (UKBB) cardiac magnetic resonance imaging (CMR) dataset and The Beyond the Cranial Vault (BTCV) abdominal challenge dataset. The experimental results indicate that our DS-Former outperforms other state-of-the-art approaches on both datasets, indicating its potential for medical images semantic segmentation.
引用
收藏
页数:11
相关论文
共 52 条
  • [1] Ba JL., 2016, arXiv
  • [2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
    Badrinarayanan, Vijay
    Kendall, Alex
    Cipolla, Roberto
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
  • [3] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
  • [4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [5] TransAttUnet: Multi-Level Attention-Guided U-Net With Transformer for Medical Image Segmentation
    Chen, Bingzhi
    Liu, Yishu
    Zhang, Zheng
    Lu, Guangming
    Kong, Adams Wai Kin
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 55 - 68
  • [6] Chen BZ, 2022, Arxiv, DOI arXiv:2107.05274
  • [7] Improving the Generalizability of Convolutional Neural Network-Based Segmentation on CMR Images
    Chen, Chen
    Bai, Wenjia
    Davies, Rhodri H.
    Bhuva, Anish N.
    Manisty, Charlotte H.
    Augusto, Joao B.
    Moon, James C.
    Aung, Nay
    Lee, Aaron M.
    Sanghvi, Mihir M.
    Fung, Kenneth
    Paiva, Jose Miguel
    Petersen, Steffen E.
    Lukaschuk, Elena
    Piechnik, Stefan K.
    Neubauer, Stefan
    Rueckert, Daniel
    [J]. FRONTIERS IN CARDIOVASCULAR MEDICINE, 2020, 7
  • [8] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
    Chen, Chun-Fu
    Fan, Quanfu
    Panda, Rameswar
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
  • [9] Chen J, 2021, arXiv
  • [10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848