Medical Image Segmentation Based on Multi-Scale Convolution Modulation

被引：0

作者：

Zhou, Xin-Min ^{[1
,2
]}

Xiong, Zhi-Mou ^{[3
]}

Shi, Chang-Fa ^{[4
,5
]}

Yang, Jian ^{[3
]}

机构：

[1] School of Artificial Intelligence and Advanced Computing, Hunan University of Technology and Business, Hunan, Changsha

[2] Xiangjiang Laboratory, Hunan, Changsha

[3] School of Computer Science, Hunan University of Technology and Business, Hunan, Changsha

[4] School of Intelligent Engineering and Intelligent Manufacturing, Hunan University of Technology and Business, Hunan, Changsha

[5] Changsha Social Laboratory of Artificial Intelligence, Hunan University of Technology and Business, Hunan, Changsha

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2024年 / 52卷 / 09期

基金：

中国国家自然科学基金;

关键词：

convolutional modulation; medical image segmentation; multi-scale; Transformer;

D O I：

10.12263/DZXB.20231068

中图分类号：

学科分类号：

摘要：

Currently, more and more medical image segmentation models are using Transformer as their basic structure. However, the computational complexity of the Transformer model is quadratic with respect to the input sequence, and it requires a large amount of data for pre-training in order to achieve good results. In situations where there is insufficient data, the Transformer's advantages cannot be fully realized. Additionally, the Transformer often fails to effectively extract local information from images. In contrast, convolutional neural networks can effectively avoid these two problems. In order to fully leverage the strengths of both convolutional neural networks and Transformers and further explore the potential of convolutional neural networks, this paper proposes a multi-scale convolution modulation network (MSCMNet) model. This model incorporates the design methodology of visual Transformer models into traditional convolutional networks. By using convolution modulation and multi-scale feature extraction strategies, a feature extraction module based on multi-scale convolution modulation (MSCM) is constructed. Efficient patch combination and patch decomposition strategies are also proposed for downsampling and upsampling of feature maps, respectively, further enhancing the model's representation ability. The mDice scores obtained on four different types and sizes of medical image segmentation datasets - multiple organs in the abdomen, heart, skin cancer, and nucleus - are 0.805 7, 0.923 3, 0.923 9 and 0.854 8, respectively. With lower computational complexity and parameter count, MSCMNet achieves the best segmentation performance, providing a novel and efficient model structure design paradigm for convolutional neural networks and Transformers in the field of medical image segmentation. © 2024 Chinese Institute of Electronics. All rights reserved.

引用

页码：3159 / 3171

页数：12

共 34 条

[11] MILLETARI F, NAVAB N, AHMADI S A., V-Net: Fully convolutional neural networks for volumetric medical image segmentation, 2016 Fourth International Conference on 3D Vision (3DV), pp. 565-571, (2016)
[12] VASWANI A, SHAZEER N, PARMAR N, Et al., Attention is all you need, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, (2017)
[13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, Et al., An image is worth 16x16 words: transformers for image recognition at scale
[14] CARION N, MASSA F, SYNNAEVE G, Et al., End-to-end object detection with transformers, Computer Vision-ECCV 2020, pp. 213-229, (2020)
[15] CHEN J, LU Y, YU Q, Et al., TransUNet: Transformers make strong encoders for medical image segmentation
[16] ZHANG Y D, LIU H Y, HU Q., TransFuse: Fusing transformers and CNNs for medical image segmentation, Medical Image Computing and Computer Assisted Intervention-MICCAI 2021, pp. 14-24, (2021)
[17] GAO Y H, ZHOU M, METAXAS D N., UTNet: A hybrid transformer architecture for medical image segmentation, Medical Image Computing and Computer Assisted Intervention-MICCAI 2021, pp. 61-71, (2021)
[18] CAO H, WANG Y Y, CHEN J, Et al., Swin-Unet: Unet-like pure transformer for medical image segmentation
[19] LIU Z, LIN Y T, CAO Y, Et al., Swin Transformer: Hierarchical vision transformer using shifted windows, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992-10002, (2021)
[20] Huang X H, Deng Z F, Li D D, Et al., MISSFormer: An effective medical image segmentation transformer [EB/OL]

← 1 2 3 4 →