BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection

被引：2

作者：

Sun, Chenwang ^{[1
]}

Zhang, Qing ^{[1
]}

Zhuang, Chenyu ^{[1
]}

Zhang, Mingqian ^{[2
]}

机构：

[1] Shanghai Inst Technol, Sch Comp Sci & Informat Engn, Shanghai 201418, Peoples R China

[2] Shanghai Inst Technol, Sch Mech Engn, Shanghai 201418, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 147卷

基金：

上海市自然科学基金;

关键词：

RGB-D salient object detection; Cross-modal fusion; Multi-modal integration; Multi-level aggregation; IMAGE;

D O I：

10.1016/j.imavis.2024.105048

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although deep learning-based RGB-D salient object detection methods have achieved impressive results in the recent years, there are still some issues need to be addressed including multi-modal fusion and multi-level aggregation. In this paper, we propose a bifurcated multi-modal fusion network (BMFNet) to address these two issues cooperatively. First, we design a multi-modal feature interaction (MFI) module to fully capture the complementary information between the RGB and depth features by leveraging the channel attention and spatial attention. Second, unlike the widely used layer-by-layer progressive fusion, we adopt a bifurcated fusion strategy for all the multi-level unimodal and cross-modal features to effectively reduce the gaps between features at different levels. For the intra-group feature aggregation, a multi-modal feature fusion (MFF) module is designed to integrate the intra-group multi-modal features to produce a low-level/high-level saliency feature. For the inter-group aggregation, a multi-scale feature learning (MFL) module is introduced to exploit the contextual interactions between different scales to boost fusion performance. Experimental results on five public RGB-D datasets demonstrate the effectiveness and superiority of our proposed network. The code and prediction maps will be available at https://github.com/ZhangQing0329/BMFNet

引用

页数：15

共 82 条

[1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2] Ao Luo, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12357), P346, DOI 10.1007/978-3-030-58610-2_21
[3] Depth-Quality-Aware Salient Object Detection
Chen, Chenglizhao
Wei, Jipeng
Peng, Chong
Qin, Hong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2350 - 2363
[4] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection
Chen, Hao
Li, Youfu
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3051 - 3060
[5] Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063
[6] Adaptive fusion network for RGB-D salient object detection
Chen, Tianyou
Xiao, Jin
Hu, Xiaoguang
Zhang, Guofeng
Wang, Shaojie
[J]. NEUROCOMPUTING, 2023, 522 : 152 - 164
[7] DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection
Chen, Zuyao
Cong, Runmin
Xu, Qianqian
Huang, Qingming
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7012 - 7024
[8] Chen ZY, 2020, AAAI CONF ARTIF INTE, V34, P10599
[9] Global Contrast based Salient Region Detection
Cheng, Ming-Ming
Zhang, Guo-Xin
Mitra, Niloy J.
Huang, Xiaolei
Hu, Shi-Min
[J]. 2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 409 - 416
[10] Cheng Y, 2014, IEEE INT CON MULTI

← 1 2 3 4 5 6 7 8 9 →