VM-UNET-V2: Rethinking Vision Mamba UNet for Medical Image Segmentation

被引:30
作者
Zhang, Mingya [1 ]
Yu, Yue [2 ]
Jin, Sun [4 ]
Gu, Limei [3 ]
Ling, Tingsheng [3 ]
Tao, Xianping [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[3] Naniing Univ Chinese Med, Affiliated Hosp, Nanjing, Peoples R China
[4] Naniing Univ Chinese Med, Nanjing, Peoples R China
来源
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024 | 2024年 / 14954卷
关键词
Medical Image Segmentation; UNet; Vision State Space Models;
D O I
10.1007/978-981-97-5128-0_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, and the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.
引用
收藏
页码:335 / 346
页数:12
相关论文
共 22 条
[1]  
Chen J., 2021, arXiv
[2]  
Chen JC, 2016, SCI REP-UK, V6, DOI [10.1038/srep25671, 10.1038/srep24454]
[3]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[4]  
Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26
[5]  
Howard AG, 2017, Arxiv, DOI arXiv:1704.04861
[6]  
Gao YH, 2022, Arxiv, DOI [arXiv:2203.00131, DOI 10.48550/ARXIV.2203.00131]
[7]  
Golan R, 2016, IEEE IJCNN, P243, DOI 10.1109/IJCNN.2016.7727205
[8]  
Gu A, 2024, Arxiv, DOI [arXiv:2312.00752, 10.48550/arXiv.2312.00752, DOI 10.48550/ARXIV.2312.00752]
[9]   Path Aggregation Network for Instance Segmentation [J].
Liu, Shu ;
Qi, Lu ;
Qin, Haifang ;
Shi, Jianping ;
Jia, Jiaya .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8759-8768
[10]  
Liu Y, 2024, Arxiv, DOI [arXiv:2401.10166, 10.48550/arXiv.2401.10166]