Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

被引:116
作者
Gao, Wei [1 ,2 ]
Liao, Guibiao [1 ,2 ]
Ma, Siwei [3 ]
Li, Ge [1 ,2 ]
Liang, Yongsheng [4 ]
Lin, Weisi [5 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] Peking Univ, Inst Digital Media, Beijing 100871, Peoples R China
[4] Harbin Inst Technol, Sch Elect & Informat Engn, Shenzhen 518055, Peoples R China
[5] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Dynamic cross-modal guided mechanism; RGB-D/RGB-T multi-modal data; information fusion; salient object detection; VISUAL-ATTENTION; COLOR-VISION; IMAGE; SEGMENTATION; MECHANISMS; MODEL;
D O I
10.1109/TCSVT.2021.3082939
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can he easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RCB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multilevel cross-modal fused features to capture both local and global information of salient objects, and can further boost the multimodal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (similar to 92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multi-modal SOD benchmark.
引用
收藏
页码:2091 / 2106
页数:16
相关论文
共 50 条
  • [21] Interactive context-aware network for RGB-T salient object detection
    Wang, Yuxuan
    Dong, Feng
    Zhu, Jinchao
    Chen, Jianren
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72153 - 72174
  • [22] TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
    Liu, Zhengyi
    Wang, Yuan
    Tu, Zhengzheng
    Xiao, Yun
    Tang, Bin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4481 - 4490
  • [23] Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    ENTROPY, 2024, 26 (02)
  • [24] PSNet: Parallel symmetric network for RGB-T salient object detection
    Bi, Hongbo
    Wu, Ranwan
    Liu, Ziqi
    Zhang, Jiayuan
    Zhang, Cong
    Xiang, Tian-Zhu
    Wang, Xiufang
    NEUROCOMPUTING, 2022, 511 (410-425) : 410 - 425
  • [25] EDGE-Net: an edge-guided enhanced network for RGB-T salient object detection
    Zheng, Xin
    Wang, Boyang
    Ai, Liefu
    Tang, Pan
    Liu, Deyang
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (06) : 63032
  • [26] Asymmetric deep interaction network for RGB-D salient object detection
    Wang, Feifei
    Li, Yongming
    Wang, Liejun
    Zheng, Panpan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [27] MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection
    Peng, Daogang
    Zhou, Weiyi
    Pan, Junzhen
    Wang, Danhao
    NEURAL NETWORKS, 2024, 171 : 410 - 422
  • [28] MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection
    Chenxing Xia
    Difeng Chen
    Xiuju Gao
    Bin Ge
    Kuan-Ching Li
    Xianjin Fang
    Yan Zhang
    Ke Yang
    The Journal of Supercomputing, 2024, 80 : 2487 - 2513
  • [29] MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection
    Xia, Chenxing
    Chen, Difeng
    Gao, Xiuju
    Ge, Bin
    Li, Kuan-Ching
    Fang, Xianjin
    Zhang, Yan
    Yang, Ke
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (02) : 2487 - 2513
  • [30] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection
    Wang, Jie
    Song, Kechen
    Bao, Yanqi
    Huang, Liming
    Yan, Yunhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2949 - 2961