Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

被引:116
作者
Gao, Wei [1 ,2 ]
Liao, Guibiao [1 ,2 ]
Ma, Siwei [3 ]
Li, Ge [1 ,2 ]
Liang, Yongsheng [4 ]
Lin, Weisi [5 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[3] Peking Univ, Inst Digital Media, Beijing 100871, Peoples R China
[4] Harbin Inst Technol, Sch Elect & Informat Engn, Shenzhen 518055, Peoples R China
[5] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Dynamic cross-modal guided mechanism; RGB-D/RGB-T multi-modal data; information fusion; salient object detection; VISUAL-ATTENTION; COLOR-VISION; IMAGE; SEGMENTATION; MECHANISMS; MODEL;
D O I
10.1109/TCSVT.2021.3082939
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The use of complementary information, namely depth or thermal information, has shown its benefits to salient object detection (SOD) during recent years. However, the RGB-D or RGB-T SOD problems are currently only solved independently, and most of them directly extract and fuse raw features from backbones. Such methods can he easily restricted by low-quality modality data and redundant cross-modal features. In this work, a unified end-to-end framework is designed to simultaneously analyze RCB-D and RGB-T SOD tasks. Specifically, to effectively tackle multi-modal features, we propose a novel multi-stage and multi-scale fusion network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the visual color stage doctrine in the human visual system (HVS), the proposed CMFM aims to explore important feature representations in feature response stage, and integrate them into cross-modal features in adversarial combination stage. Moreover, the proposed BMD learns the combination of multilevel cross-modal fused features to capture both local and global information of salient objects, and can further boost the multimodal SOD performance. The proposed unified cross-modality feature analysis framework based on two-stage and multi-scale information fusion can be used for diverse multi-modal SOD tasks. Comprehensive experiments (similar to 92K image-pairs) demonstrate that the proposed method consistently outperforms the other 21 state-of-the-art methods on nine benchmark datasets. This validates that our proposed method can work well on diverse multi-modal SOD tasks with good generalization and robustness, and provides a good multi-modal SOD benchmark.
引用
收藏
页码:2091 / 2106
页数:16
相关论文
共 50 条
  • [1] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Gao, Lina
    Fu, Ping
    Xu, Mingzhu
    Wang, Tiantian
    Liu, Bing
    VISUAL COMPUTER, 2024, 40 (03) : 1565 - 1582
  • [2] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Lina Gao
    Ping Fu
    Mingzhu Xu
    Tiantian Wang
    Bing Liu
    The Visual Computer, 2024, 40 : 1565 - 1582
  • [3] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [4] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622
  • [5] Multi-modal interactive attention and dual progressive decoding network for RGB-D/T salient object detection
    Liang, Yanhua
    Qin, Guihe
    Sun, Minghui
    Qin, Jun
    Yan, Jie
    Zhang, Zhonghan
    NEUROCOMPUTING, 2022, 490 : 132 - 145
  • [6] Revisiting Feature Fusion for RGB-T Salient Object Detection
    Zhang, Qiang
    Xiao, Tonglin
    Huang, Nianchang
    Zhang, Dingwen
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818
  • [7] SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection
    Liu, Zhengyi
    Tan, Yacheng
    He, Qian
    Xiao, Yun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4486 - 4497
  • [8] Siamese Network for RGB-D Salient Object Detection and Beyond
    Fu, Keren
    Fan, Deng-Ping
    Ji, Ge-Peng
    Zhao, Qijun
    Shen, Jianbing
    Zhu, Ce
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5541 - 5559
  • [9] Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection
    Tu, Zhengzheng
    Lin, Danying
    Jiang, Bo
    Gu, Le
    Wang, Kunpeng
    Zhai, Sulan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 34 - 48
  • [10] Progressive multi-scale fusion network for RGB-D salient object detection
    Ren, Guangyu
    Xie, Yanchun
    Dai, Tianhong
    Stathaki, Tania
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 223