CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection

被引:56
作者
Chen, Gang [1 ]
Shao, Feng [1 ]
Chai, Xiongli [1 ]
Chen, Hangwei [1 ]
Jiang, Qiuping [1 ]
Meng, Xiangchao [1 ]
Ho, Yo-Sung [2 ]
机构
[1] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China
[2] Gwangju Inst Sci & Technol GIST, Sch Informat & Commun, Gwangju 500712, South Korea
基金
中国国家自然科学基金; 浙江省自然科学基金;
关键词
Feature extraction; Image edge detection; Task analysis; Object detection; Transformers; Semantics; Visualization; RGB-T salient object detection; modality difference; cross-guided fusion; transformer; FUSION NETWORK;
D O I
10.1109/TCSVT.2022.3166914
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
How to explore the interaction between the RGB and thermal modalities is the key success of the RGB-T saliency object detection (SOD). Most of the existing methods integrate multi-modality information by designing various fusion strategies. However, the modality gap between the RGB and thermal features will lead to unsatisfactory performances by simple feature concatenation. To solve this problem, we innovatively propose a cross-guided modality difference reduction network (CGMDRNet) to achieve intrinsic consistency feature fusion via reducing the modality differences. Specifically, we design a modality difference reduction (MDR) module, which is embedded in each layer of the backbone network. The module uses a cross-guided strategy to reduce the modality difference between the RGB and thermal features. Then, a cross-attention fusion (CAF) module is designed to fuse cross-modality features with small modality differences. In addition, we use a transformer-based feature enhancement (TFE) module to enhance the high-level feature representation that contributes more to performance. Finally, the high-level features guide the fusion of low-level features to obtain a saliency map with clear boundaries. Extensive experiments on three public RGB-T datasets show that the proposed CGMDRNet achieves competitive performance compared with state-of-the-art (SOTA) RGB-T SOD models.
引用
收藏
页码:6308 / 6323
页数:16
相关论文
共 92 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[3]   Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060
[4]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[5]  
Chen J., 2021, arXiv, DOI 10.48550/arXiv:2102.04306
[6]   DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection [J].
Chen, Zuyao ;
Cong, Runmin ;
Xu, Qianqian ;
Huang, Qingming .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :7012-7024
[7]   BING: Binarized Normed Gradients for Objectness Estimation at 300fps [J].
Cheng, Ming-Ming ;
Zhang, Ziming ;
Lin, Wen-Yan ;
Torr, Philip .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3286-3293
[8]   RepFinder: Finding Approximately Repeated Scene Elements for Image Editing [J].
Cheng, Ming-Ming ;
Zhang, Fang-Lue ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04)
[9]   Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].
Cong, Runmin ;
Lei, Jianjun ;
Zhang, Changqing ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823
[10]   Going From RGB to RGBD Saliency: A Depth-Guided Transformation Model [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Hou, Junhui ;
Huang, Qingming ;
Kwong, Sam .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (08) :3627-3639