MambaSOD: Dual Mamba-driven cross-modal fusion network for RGB-D Salient Object Detection

被引:0
|
作者
Zhan, Yue [2 ]
Zeng, Zhihong [1 ,3 ]
Liu, Haijun [3 ]
Tan, Xiaoheng [3 ]
Tian, Yinli [4 ]
机构
[1] Guangdong Polytech Normal Univ, Inst Interdisciplinary Studies, Guangzhou, Peoples R China
[2] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 400044, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Software Engn, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D salient object detection; State Space Model; Mamba-based backbone; Cross-modal Fusion Mamba;
D O I
10.1016/j.neucom.2025.129718
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. Numerous conventional models heavily rely on CNN and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce quadratic computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task and the primary challenge is how to effectively fuse the complementary information from RGB and depth. Recent advancements in Mamba, particularly its superior ability to perform long-range modeling within linear efficiency, have motivated our exploration of its potential in the RGB-D SOD task. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD, which effectively leverages Mamba's long-range dependency modeling capability. Specifically, we employ a dual Mamba-driven feature extractor to process RGB and depth inputs to obtain features with global contextual information. Then, we design a cross-modal fusion Mamba to perform modality-specific feature enhancement and model the inter-modal correlation between the RGB and depth features. To the best of our knowledge, this work is an innovative attempt to explore the potential of the pure Mamba in the RGB-D SOD task, offering a novel perspective. Numerous experiments conducted on seven prevailing datasets demonstrate our method's superiority over eighteen state-of-the-art RGB-D SOD models. The source code will be released at https://github.com/YueZhan721/MambaSOD.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] RGB-D salient object detection with asymmetric cross-modal fusion
    Yu M.
    Xing Z.-H.
    Liu Y.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2487 - 2495
  • [2] Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection
    Hu, Xihang
    Sun, Fuming
    Sun, Jing
    Wang, Fasheng
    Li, Haojie
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3067 - 3085
  • [3] Cross-modal hierarchical interaction network for RGB-D salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhu, Huihui
    Zhang, Cong
    Xiang, Tian -Zhu
    PATTERN RECOGNITION, 2023, 136
  • [4] A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection
    Liu, Zhengyi
    Zhang, Wei
    Zhao, Peng
    NEUROCOMPUTING, 2020, 387 : 210 - 220
  • [5] CMA-SOD: cross-modal attention fusion network for RGB-D salient object detection
    Wang, Kexuan
    Liu, Chenhua
    Zhang, Rongfu
    VISUAL COMPUTER, 2024,
  • [6] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhao, Zhengyun
    Huang, Ziqing
    Chai, Xiuli
    Wang, Jun
    NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
  • [7] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [8] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhengyun Zhao
    Ziqing Huang
    Xiuli Chai
    Jun Wang
    Neural Processing Letters, 2023, 55 : 361 - 384
  • [9] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
  • [10] RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness
    Sun F.-M.
    Hu X.-H.
    Wu J.-Y.
    Sun J.
    Wang F.-S.
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (04): : 1899 - 1913