Multi-scale convolutional attention frequency-enhanced transformer network for medical image segmentation

被引:0
作者
Yan, Shun [1 ]
Yang, Benquan [1 ]
Chen, Aihua [1 ]
Zhao, Xiaoming [1 ]
Zhang, Shiqing [1 ]
机构
[1] Taizhou Univ, Inst Intelligent Informat Proc, Taizhou 318000, Zhejiang, Peoples R China
关键词
Frequency-enhanced transformer; Multi-scale convolutional attention; Wavelet transform; Progressive attention; UNET;
D O I
10.1016/j.inffus.2025.103019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic segmentation of medical images plays a crucial role in assisting doctors with diagnosis and treatment planning. Among them, multi-scale vision transformer has become a powerful tool for medical image segmentation. However, due to its overly aggressive self-attention design leads to issues such as insufficient local feature extraction and lack of detailed feature information. To address these problems, this study proposes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Network (MCAFT), which includes Multi-Scale Convolutional Attention Frequency-Enhanced Transformer Modules (MCAFTM) and Multi- Scale Progressive Gate-Spatial Attention (MSGA). MCAFTM employs channel, spatial mechanisms, which are highly effective in capturing complex spatial relationships while focusing on prominent regions. Additionally, it applies Discrete Wavelet Transform (DWT) to decompose input feature maps into sub-bands: low-frequency sub-band (LL), which captures overall structural information, and high-frequency sub-bands (LH, HL, HH) which retain fine-grained details such as edges and textures. Subsequently, an efficient transformer and reverse attention mechanism are employed to enhance contextual attention and boundary information. The proposed MSGA enhances multi-scale context, adaptively modeling inter-scale dependencies to bridge the semantic gap between encoder and decoder modules. Extensive experiments are conducted on several representative medical image segmentation tasks, including synapse abdominal multi-organ, cardiac organ, and polyp lesions. The proposed MCAFTM achieves DICE scores of 83.87 and 92.32 for synapse abdominal multi-organ and cardiac organ segmentation, respectively. For five polyp datasets (ClinicDB, Kvasir, ColonDB, ETIS, CVC-T), MCAFTM obtaines DICE scores of 94.49, 92.62, 81.07, 78.68, and 88.91 respectively. These results demonstrate that both MCAFTM and MSGA are effective architectures.
引用
收藏
页数:11
相关论文
共 56 条
[1]  
Alexey D, 2020, arXiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[2]  
Alrfou K, 2024, Arxiv, DOI arXiv:2406.05891
[3]  
Azad R., 2023, INT WORKSH MACH LEAR, P207
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]   Beyond Deep Residual Learning for Image Restoration: Persistent Homology-Guided Manifold Simplification [J].
Bae, Woong ;
Yoo, Jaejun ;
Ye, Jong Chul .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1141-1149
[6]   Attention Augmented Convolutional Networks [J].
Bello, Irwan ;
Zoph, Barret ;
Vaswani, Ashish ;
Shlens, Jonathon ;
Le, Quoc V. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3285-3294
[7]   WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians [J].
Bernal, Jorge ;
Javier Sanchez, F. ;
Fernandez-Esparrach, Gloria ;
Gil, Debora ;
Rodriguez, Cristina ;
Vilarino, Fernando .
COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2015, 43 :99-111
[8]  
Cao Hu, 2023, Computer Vision - ECCV 2022 Workshops: Proceedings. Lecture Notes in Computer Science (13803), P205, DOI 10.1007/978-3-031-25066-8_9
[9]   LEFORMER: A HYBRID CNN-TRANSFORMER ARCHITECTURE FOR ACCURATE LAKE EXTRACTION FROM REMOTE SENSING IMAGERY [J].
Chen, Ben ;
Zou, Xuechao ;
Zhang, Yu ;
Li, Jiayu ;
Li, Kai ;
Xing, Junliang ;
Tao, Pin .
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, :5710-5714
[10]  
Chen J., 2021, PROC MED IMAG DEEP L, DOI [arXiv:2104.06468, 10.48550/arXiv.2104.06468]