MambaSOD: Dual Mamba-driven cross-modal fusion network for RGB-D Salient Object Detection

被引:3
作者
Zhan, Yue [2 ]
Zeng, Zhihong [1 ,3 ]
Liu, Haijun [3 ]
Tan, Xiaoheng [3 ]
Tian, Yinli [4 ]
机构
[1] Guangdong Polytech Normal Univ, Inst Interdisciplinary Studies, Guangzhou, Peoples R China
[2] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
[3] Chongqing Univ, Sch Microelect & Commun Engn, Chongqing 400044, Peoples R China
[4] Chongqing Univ Posts & Telecommun, Sch Software Engn, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D salient object detection; State Space Model; Mamba-based backbone; Cross-modal Fusion Mamba;
D O I
10.1016/j.neucom.2025.129718
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. Numerous conventional models heavily rely on CNN and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce quadratic computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task and the primary challenge is how to effectively fuse the complementary information from RGB and depth. Recent advancements in Mamba, particularly its superior ability to perform long-range modeling within linear efficiency, have motivated our exploration of its potential in the RGB-D SOD task. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD, which effectively leverages Mamba's long-range dependency modeling capability. Specifically, we employ a dual Mamba-driven feature extractor to process RGB and depth inputs to obtain features with global contextual information. Then, we design a cross-modal fusion Mamba to perform modality-specific feature enhancement and model the inter-modal correlation between the RGB and depth features. To the best of our knowledge, this work is an innovative attempt to explore the potential of the pure Mamba in the RGB-D SOD task, offering a novel perspective. Numerous experiments conducted on seven prevailing datasets demonstrate our method's superiority over eighteen state-of-the-art RGB-D SOD models. The source code will be released at https://github.com/YueZhan721/MambaSOD.
引用
收藏
页数:11
相关论文
共 70 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection [J].
Chen, Gang ;
Shao, Feng ;
Chai, Xiongli ;
Chen, Hangwei ;
Jiang, Qiuping ;
Meng, Xiangchao ;
Ho, Yo-Sung .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) :1787-1801
[3]  
Cheng Y, 2014, IEEE INT CON MULTI
[4]   Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection [J].
Cong, Runmin ;
Liu, Hongyu ;
Zhang, Chen ;
Zhang, Wei ;
Zheng, Feng ;
Song, Ran ;
Kwong, Sam .
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, :406-416
[5]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
[6]   BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network [J].
Fan, Deng-Ping ;
Zhai, Yingjie ;
Borji, Ali ;
Yang, Jufeng ;
Shao, Ling .
COMPUTER VISION - ECCV 2020, PT XII, 2020, 12357 :275-292
[7]   Structure-measure: A New Way to Evaluate Foreground Maps [J].
Fan, Deng-Ping ;
Cheng, Ming-Ming ;
Liu, Yun ;
Li, Tao ;
Borji, Ali .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4558-4567
[8]  
Fan DP, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P698
[9]   Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks [J].
Fan, Deng-Ping ;
Lin, Zheng ;
Zhang, Zhao ;
Zhu, Menglong ;
Cheng, Ming-Ming .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) :2075-2089
[10]   Shifting More Attention to Video Salient Object Detection [J].
Fan, Deng-Ping ;
Wang, Wenguan ;
Cheng, Ming-Ming ;
Shen, Jianbing .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8546-8556