BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection

被引：2

作者：

Sun, Chenwang ^{[1
]}

Zhang, Qing ^{[1
]}

Zhuang, Chenyu ^{[1
]}

Zhang, Mingqian ^{[2
]}

机构：

[1] Shanghai Inst Technol, Sch Comp Sci & Informat Engn, Shanghai 201418, Peoples R China

[2] Shanghai Inst Technol, Sch Mech Engn, Shanghai 201418, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 147卷

基金：

上海市自然科学基金;

关键词：

RGB-D salient object detection; Cross-modal fusion; Multi-modal integration; Multi-level aggregation; IMAGE;

D O I：

10.1016/j.imavis.2024.105048

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although deep learning-based RGB-D salient object detection methods have achieved impressive results in the recent years, there are still some issues need to be addressed including multi-modal fusion and multi-level aggregation. In this paper, we propose a bifurcated multi-modal fusion network (BMFNet) to address these two issues cooperatively. First, we design a multi-modal feature interaction (MFI) module to fully capture the complementary information between the RGB and depth features by leveraging the channel attention and spatial attention. Second, unlike the widely used layer-by-layer progressive fusion, we adopt a bifurcated fusion strategy for all the multi-level unimodal and cross-modal features to effectively reduce the gaps between features at different levels. For the intra-group feature aggregation, a multi-modal feature fusion (MFF) module is designed to integrate the intra-group multi-modal features to produce a low-level/high-level saliency feature. For the inter-group aggregation, a multi-scale feature learning (MFL) module is introduced to exploit the contextual interactions between different scales to boost fusion performance. Experimental results on five public RGB-D datasets demonstrate the effectiveness and superiority of our proposed network. The code and prediction maps will be available at https://github.com/ZhangQing0329/BMFNet

引用

页数：15

共 50 条

[21] MULTI-MODALITY DIVERSITY FUSION NETWORK WITH SWINTRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
Duan, Songsong
Xia, Chenxing
Gao, Xiuju
Ge, Bin
Zhang, Hanling
Li, Kuan-Ching
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1076 - 1080
[22] Multi-modality information refinement fusion network for RGB-D salient object detection
Bao, Hua
Fan, Bo
VISUAL COMPUTER, 2024, 40 (06): : 4183 - 4199
[23] M3Net: Multi-scale Multi-path Multi-modal Fusion Network and Example Application to RGB-D Salient Object Detection
Chen, Hao
Li, You-Fu
Su, Dan
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4911 - 4916
[24] Multi-level cross-modal interaction network for RGB-D salient object detection
Huang, Zhou
Chen, Huai-Xin
Zhou, Tao
Yang, Yun-Zhi
Liu, Bi-Yuan
NEUROCOMPUTING, 2021, 452 : 200 - 211
[25] Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object Detection
Gao, Haorao
Su, Yiming
Wang, Fasheng
Li, Haojie
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)
[26] A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection
Liu, Zhengyi
Zhang, Wei
Zhao, Peng
NEUROCOMPUTING, 2020, 387 : 210 - 220
[27] Cross-modal hierarchical interaction network for RGB-D salient object detection
Bi, Hongbo
Wu, Ranwan
Liu, Ziqi
Zhu, Huihui
Zhang, Cong
Xiang, Tian -Zhu
PATTERN RECOGNITION, 2023, 136
[28] Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
Zhu, Jinchao
Zhang, Xiaoyu
Fang, Xian
Dong, Feng
Qiu, Yu
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 359 - 363
[29] CMA-SOD: cross-modal attention fusion network for RGB-D salient object detection
Wang, Kexuan
Liu, Chenhua
Zhang, Rongfu
VISUAL COMPUTER, 2024,
[30] MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection
Liao, Guibiao
Gao, Wei
Jiang, Qiuping
Wang, Ronggang
Li, Ge
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2436 - 2444

← 1 2 3 4 5 →