A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection

被引：12

作者：

Wang, Yue ^{[1
]}

Jia, Xu ^{[1
]}

Zhang, Lu ^{[1
]}

Li, Yuke ^{[2
]}

Elder, James H. ^{[3
]}

Lu, Huchuan ^{[1
]}

机构：

[1] Dalian Univ Technol, Dalian 116024, Peoples R China

[2] Univ Calif Berkeley, Berkeley, CA 94804 USA

[3] York Univ, Toronto, ON M3J 1P3, Canada

来源：

PATTERN RECOGNITION | 2023年 / 140卷

关键词：

Saliency detection; RGB-D image; Transformer; Attention; OBJECT DETECTION; NETWORK;

D O I：

10.1016/j.patcog.2023.109516

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-D saliency detection integrates information from both RGB images and depth maps to improve the prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based structure to address this issue. The proposed architecture is composed of two modules: an Intra-modality Feature Enhancement Module (IFEM) and an Inter-modality Feature Fusion Module (IFFM). IFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. IFEM enhances feature on each scale by selecting and integrating complementary information from other scales within the same modality before IFFM. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement, and simplifies the model design. Extensive experimental results on five benchmark datasets demonstrate that our proposed network performs favorably against most state-of-the-art RGB-D saliency detection methods. Furthermore, our model is efficient for having relatively smaller FLOPs and model size compared with other methods. (C) 2023 Elsevier Ltd. All rights reserved.

引用

页数：12

共 58 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2] Salient Object Detection: A Benchmark [J].