DS-Former: A dual-stream encoding-based transformer for 3D medical image segmentation

被引：0

作者：

Zhang, Lei ^{[1
]}

Zuo, Yi ^{[1
]}

Jia, Yu ^{[2
]}

Li, Dongze ^{[3
]}

Zeng, Rui ^{[4
]}

Li, Dong ^{[1
,5
]}

Chen, Junren ^{[1
]}

Wang, Wei ^{[6
]}

机构：

[1] Sichuan Univ, Comp Sci, 24 South Sect 1,Yihuan Rd, Chengdu 610065, Sichuan, Peoples R China

[2] Sichuan Univ, West China Hosp, Int Med Ctr Ward, Gen Practice Med Ctr,Gen Practice Ward, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China

[3] Sichuan Univ, West China Hosp, West China Sch Med, Dept Emergency Med,Disaster Med Ctr, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China

[4] Sichuan Univ, West China Hosp, West China Sch Med, Dept Cardiol, 37 Guoxue Alley, Chengdu 610065, Sichuan, Peoples R China

[5] Qinghai Univ, Dept Comp Sci, 251 Ningda Rd, Xining 810016, Qinghai, Peoples R China

[6] Chengdu Univ Informat Technol, Sch Automat, 24 Block 1,Xuefu Rd, Chengdu 610225, Sichuan, Peoples R China

来源：

BIOMEDICAL SIGNAL PROCESSING AND CONTROL | 2024年 / 89卷

关键词：

Dual-stream module; Transformer; Convolution; 3D medical image segmentation; ATTENTION;

D O I：

10.1016/j.bspc.2023.105702

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Models that utilize self-attention mechanisms, including but not limited to Vision Transformers (ViTs), have shown promising performance in visual tasks like semantic segmentation. This is attributed to their capacity to capture global features of images, enabling them to learn more comprehensive representations. However, transformer-based models typically demand a considerable amount of training data to achieve satisfactory performance, while being deficient in the ability to efficiently extract local image features. As a result, these models may not be as effective in some computer vision tasks that involve small-scale datasets, like medical image segmentation. To address these issues, this paper proposes a dual-stream encoding-based transformer dubbed as Dual-stream Transformer (DS-Former). The dual-stream module in DS-Former can simultaneously acquire local and global features in the image and construct relation between the two kinds of features via self-attention. Compared with the simple splicing or serial connection, the dual-stream module can extract more comprehensive and hierarchical feature information from the fusion interaction of the two features. Our method is evaluated on the UK Biobank (UKBB) cardiac magnetic resonance imaging (CMR) dataset and The Beyond the Cranial Vault (BTCV) abdominal challenge dataset. The experimental results indicate that our DS-Former outperforms other state-of-the-art approaches on both datasets, indicating its potential for medical images semantic segmentation.

引用

页数：11

共 52 条

[1] Ba JL., 2016, arXiv
[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Badrinarayanan, Vijay
Kendall, Alex
Cipolla, Roberto
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) : 2481 - 2495
[3] Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[4] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[5] TransAttUnet: Multi-Level Attention-Guided U-Net With Transformer for Medical Image Segmentation
Chen, Bingzhi
Liu, Yishu
Zhang, Zheng
Lu, Guangming
Kong, Adams Wai Kin
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (01): : 55 - 68
[6] Chen BZ, 2022, Arxiv, DOI arXiv:2107.05274
[7] Improving the Generalizability of Convolutional Neural Network-Based Segmentation on CMR Images
Chen, Chen
Bai, Wenjia
Davies, Rhodri H.
Bhuva, Anish N.
Manisty, Charlotte H.
Augusto, Joao B.
Moon, James C.
Aung, Nay
Lee, Aaron M.
Sanghvi, Mihir M.
Fung, Kenneth
Paiva, Jose Miguel
Petersen, Steffen E.
Lukaschuk, Elena
Piechnik, Stefan K.
Neubauer, Stefan
Rueckert, Daniel
[J]. FRONTIERS IN CARDIOVASCULAR MEDICINE, 2020, 7
[8] CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chen, Chun-Fu
Fan, Quanfu
Panda, Rameswar
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 347 - 356
[9] Chen J, 2021, arXiv
[10] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 →