VM-UNET-V2: Rethinking Vision Mamba UNet for Medical Image Segmentation

被引：30

作者：

Zhang, Mingya ^{[1
]}

Yu, Yue ^{[2
]}

Jin, Sun ^{[4
]}

Gu, Limei ^{[3
]}

Ling, Tingsheng ^{[3
]}

Tao, Xianping ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[3] Naniing Univ Chinese Med, Affiliated Hosp, Nanjing, Peoples R China

[4] Naniing Univ Chinese Med, Nanjing, Peoples R China

来源：

BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024 | 2024年 / 14954卷

关键词：

Medical Image Segmentation; UNet; Vision State Space Models;

D O I：

10.1007/978-981-97-5128-0_27

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, and the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.

引用

页码：335 / 346

页数：12

共 22 条

[1]

Chen J., 2021, arXiv

[2]

Chen JC, 2016, SCI REP-UK, V6, DOI [10.1038/srep25671, 10.1038/srep24454]

[3] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[4]

Deng-Ping Fan, 2020, Medical Image Computing and Computer Assisted Intervention - MICCAI 2020. 23rd International Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12266), P263, DOI 10.1007/978-3-030-59725-2_26

[5]

Howard AG, 2017, Arxiv, DOI arXiv:1704.04861

[6]

Gao YH, 2022, Arxiv, DOI [arXiv:2203.00131, DOI 10.48550/ARXIV.2203.00131]

[7]

Golan R, 2016, IEEE IJCNN, P243, DOI 10.1109/IJCNN.2016.7727205

[8]

Gu A, 2024, Arxiv, DOI [arXiv:2312.00752, 10.48550/arXiv.2312.00752, DOI 10.48550/ARXIV.2312.00752]

[9] Path Aggregation Network for Instance Segmentation [J].

Liu, Shu ;

Qi, Lu ;

Qin, Haifang ;

Shi, Jianping ;

Jia, Jiaya .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8759-8768

[10]

Liu Y, 2024, Arxiv, DOI [arXiv:2401.10166, 10.48550/arXiv.2401.10166]

← 1 2 3 →