MambaTriNet: A Mamba-Based Tribackbone Multimodal Remote Sensing Image Semantic Segmentation Model

被引：0

作者：

Ye, Famao ^{[1
]}

Tan, Shubin ^{[2
]}

Huang, Wenye ^{[2
]}

Xu, Xiaohua ^{[3
]}

Jiang, Shunliang ^{[2
]}

机构：

[1] East China Univ Technol, Key Lab Mine Environm Monitoring & Improving Poyan, Minist Nat Resources, Nanchang 330013, Peoples R China

[2] Nanchang Univ, Sch Math & Comp, Nanchang 330031, Peoples R China

[3] Jiangxi Acad Water Sci & Engn, Nanchang 330029, Peoples R China

来源：

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS | 2025年 / 22卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Remote sensing; Training; Semantic segmentation; Data mining; Computer architecture; Semantics; Cross layer design; Computational modeling; Artificial intelligence; Multimodal feature fusion; remote sensing; semantic segmentation; visual state-space model (SSM); TRANSFORMER; NETWORK;

D O I：

10.1109/LGRS.2025.3566965

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Integrating digital surface models (DSMs) with remote sensing images has emerged as a pivotal strategy for remote sensing semantic segmentation. While prevailing dual-branch convolutional frameworks independently process DSM and remote sensing image, their inherent limitations in modeling long-range contextual dependencies persist due to convolutional operations' local receptive fields. Notably, the hierarchical feature in the U-model inherently embodies multiscale complementary relationships, yet current multimodal fusion paradigms insufficiently exploit this architectural advantage. In this letter, we propose an efficient three-branch structure encoder to simultaneously extract DSM features and local and global features of remote sensing images. Notably, we leverage the recently introduced Mamba instead of the Transformer to capture long-range dependencies, significantly reducing computational complexity while maintaining competitive performance. The extracted features are integrated using the trifeature complementary fusion (TriFusion) module, which employs a stepwise fusion strategy to learn the spatial complementarity between global and local features and channelwise complementarity between remote sensing images and DSM. In addition, we introduce a cross-layer feature guidance (CLFG) module within the skip connections to improve segmentation accuracy by facilitating enhanced cross-layer feature propagation. Extensive experiments conducted on two high-resolution remote sensing datasets, ISPRS Vaihingen and Potsdam datasets, demonstrate that the proposed MambaTriNet surpasses existing state-of-the-art methods in performance metrics, achieving 83.84% and 86.05% mean intersection over union (mIoU), respectively.

引用

页数：5

共 25 条

[1] Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks [J].

Audebert, Nicolas ;

Le Saux, Bertrand ;

Lefevre, Sebastien .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2018, 140 :20-32

[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[3] Semantic Segmentation of Remote Sensing Images With Transformer-Based U-Net and Guided Focal-Axial Attention [J].

Blaga, Bianca-Cerasela-Zelia ;

Nedevschi, Sergiu .

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 :18303-18318

[4] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[5]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[6] A Mamba-Diffusion Framework for Multimodal Remote Sensing Image Semantic Segmentation [J].

Du, Wen-Liang ;

Gu, Yang ;

Zhao, Jiaqi ;

Zhu, Hancheng ;

Yao, Rui ;

Zhou, Yong .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21

[7] FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture [J].

Hazirbas, Caner ;

Ma, Lingni ;

Domokos, Csaba ;

Cremers, Daniel .

COMPUTER VISION - ACCV 2016, PT I, 2017, 10111 :213-228

[8] CMGFNet: A deep cross-modal gated fusion network for building extraction from very high-resolution remote sensing images [J].

Hosseinpour, Hamidreza ;

Samadzadegan, Farhad ;

Javan, Farzaneh Dadrass .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 184 :96-115

[9]

Huang L., 2023, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., V16, P8370

[10] ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery [J].

Li, Rui ;

Zheng, Shunyi ;

Zhang, Ce ;

Duan, Chenxi ;

Wang, Libo ;

Atkinson, Peter M. .

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 181 :84-98

← 1 2 3 →