FCFIG-Net: feature complementary fusion and information-guided network for RGB-D salient object detection

被引：0

作者：

Du, Haishun ^{[1
,2
]}

Qiao, Kangyi ^{[1
,2
]}

Zhang, Wenzhe ^{[1
]}

Zhang, Zhengyang ^{[1
]}

Wang, Sen ^{[1
]}

机构：

[1] Henan Univ, Sch Artificial Intelligence, Zhengzhou 450046, Peoples R China

[2] Int Joint Res Lab Cooperat Vehicular Networks Hena, Zhengzhou 450046, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 12期

关键词：

Salient object detection; Feature complementarity; Contextual information; Information-guided decoding;

D O I：

10.1007/s11760-024-03489-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

RGB-D salient object detection (SOD) aims to detect salient objects by fusing salient information in RGB and depth images. Although the cross-modal information fusion strategies employed by existing RGB-D SOD models can effectively fuse information from different modalities, most of them ignore the contextual information that is gradually diluted during feature fusion process, and they also lack the full exploitation and use of common and global information in bimodal images. In addition, although the decoding strategies adopted by existing RGB-D SOD models can effectively decode multi-level fused features, most of them do not fully mine the semantic information contained in the high-level fused features and the detail information contained in the low-level fused features, and they also do not fully utilize such information to steer the decoding, which in turn leads to the poor structure completeness and detail richness of generated saliency maps. To overcome the above-mentioned problems, we propose a feature complementary fusion and information-guided network (FCFIG-Net) for RGB-D SOD, which consists of a feature complementary fusion encoder and an information-guided decoder. In FCFIG-Net, the feature complementary fusion encoder and the information-guided decoder cooperate with each other to not only enhance and fuse the multi-modal features and the contextual information during the feature encoding process, but also fully utilize the semantic and detailed information in the features during the feature decoding process. Concretely, we first design a feature complementary enhancement module (FCEM), which enhances the representational capability of features from different modalities by utilizing the information complementarity among them. Then, in order to supplement the contextual information that is gradually diluted during the feature fusion process, we design a global contextual information extraction module (GCIEM) that extracts the global contextual information from deep encoded features. Furthermore, we design a multi-modal feature fusion module (MFFM), which achieves the sufficient fusion of bimodal features and global contextual information on the basis of fully mining and enhancing the common and global information contained in bimodal features. Using FCEM, GCIEM, MFFM, and RestNet50 backbone, we design a feature complementary fusion encoder. In addition, we also design a guidance decoding unit (GDU). Finally, Using GDU and an existing cascaded decoder, we design an information-guided decoder (IGD), which achieves high-quality step-by-step decoding of multi-level fused features based on fully utilizing the semantic information in the high-level fused features and the detail information in the low-level fused features. Extensive experiments on six widely used RGB-D datasets indicate that the performance of FCFIG-Net reaches current state-of-the-art.

引用

页码：8547 / 8563

页数：17

共 72 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

Ba J, 2014, ACS SYM SER

[3] Cross-modal hierarchical interaction network for RGB-D salient object detection [J].

Bi, Hongbo ;

Wu, Ranwan ;

Liu, Ziqi ;

Zhu, Huihui ;

Zhang, Cong ;

Xiang, Tian -Zhu .

PATTERN RECOGNITION, 2023, 136

[4] Depth-Quality-Aware Salient Object Detection [J].

Chen, Chenglizhao ;

Wei, Jipeng ;

Peng, Chong ;

Qin, Hong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2350-2363

[5] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[6]

Chen Q, 2021, AAAI CONF ARTIF INTE, V35, P1063

[7] Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection [J].

Chen, Shuhan ;

Fu, Yun .

COMPUTER VISION - ECCV 2020, PT VIII, 2020, 12353 :520-538

[8] Adaptive fusion network for RGB-D salient object detection [J].

Chen, Tianyou ;

Xiao, Jin ;

Hu, Xiaoguang ;

Zhang, Guofeng ;

Wang, Shaojie .

NEUROCOMPUTING, 2023, 522 :152-164

[9] CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection [J].

Chen, Tianyou ;

Hu, Xiaoguang ;

Xiao, Jin ;

Zhang, Guofeng ;

Wang, Shaojie .

NEURAL COMPUTING & APPLICATIONS, 2022, 34 (10) :7547-7563

[10] DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection [J].

Chen, Zuyao ;

Cong, Runmin ;

Xu, Qianqian ;

Huang, Qingming .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :7012-7024

← 1 2 3 4 5 6 7 8 →