A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection

被引:12
作者
Wang, Yue [1 ]
Jia, Xu [1 ]
Zhang, Lu [1 ]
Li, Yuke [2 ]
Elder, James H. [3 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Dalian 116024, Peoples R China
[2] Univ Calif Berkeley, Berkeley, CA 94804 USA
[3] York Univ, Toronto, ON M3J 1P3, Canada
关键词
Saliency detection; RGB-D image; Transformer; Attention; OBJECT DETECTION; NETWORK;
D O I
10.1016/j.patcog.2023.109516
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-D saliency detection integrates information from both RGB images and depth maps to improve the prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based structure to address this issue. The proposed architecture is composed of two modules: an Intra-modality Feature Enhancement Module (IFEM) and an Inter-modality Feature Fusion Module (IFFM). IFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. IFEM enhances feature on each scale by selecting and integrating complementary information from other scales within the same modality before IFFM. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement, and simplifies the model design. Extensive experimental results on five benchmark datasets demonstrate that our proposed network performs favorably against most state-of-the-art RGB-D saliency detection methods. Furthermore, our model is efficient for having relatively smaller FLOPs and model size compared with other methods. (C) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 58 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Jiang, Huaizu ;
Li, Jia .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722
[3]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[4]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[5]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[6]  
Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063
[7]   EF-Net: A novel enhancement and fusion network for RGB-D saliency detection [J].
Chen, Qian ;
Fu, Keren ;
Liu, Ze ;
Chen, Geng ;
Du, Hongwei ;
Qiu, Bensheng ;
Shao, Ling .
PATTERN RECOGNITION, 2021, 112
[8]   Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection [J].
Chen, Shuhan ;
Fu, Yun .
COMPUTER VISION - ECCV 2020, PT VIII, 2020, 12353 :520-538
[9]  
Cheng Y, 2014, IEEE INT CON MULTI
[10]   Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].
Cong, Runmin ;
Lei, Jianjun ;
Zhang, Changqing ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823