A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection

被引:45
作者
Liu, Zhengyi [1 ]
Zhang, Wei [1 ]
Zhao, Peng [1 ]
机构
[1] Anhui Univ, Sch Comp Sci & Technol, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D salient object detection; Generative adversarial network; Cross-modal guidance; Adaptive gated fusion;
D O I
10.1016/j.neucom.2020.01.045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Salient object detection in RGB-D images aims to identify the most attractive objects in a pair of color and depth images for the observer. As an important branch of salient object detection, it focuses on solving the following two major challenges: how to achieve cross-modal fusion that is efficient and beneficial for salient object detection; how to effectively extract the information of depth image with relatively poor quality. This paper proposes a cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection by using color and depth images. Specifically, the generator network adopts double-stream encoder-decoder network and receives RGB and depth images at the same time. The proposed depthwise separable residual convolution module is used to deal with deep semantic information, and the processed feature is combined with side-output features of the encoder network progressively. In order to compensate for the shortcoming of poor quality of the depth image, the proposed method adds the cross-modal guidance from the side-output features of the RGB stream to the decoder network of depth stream. The discriminator network adaptively fuses the features of double streams using a gated fusion module, then sends the gated fusion saliency map to the discriminator to distinguish the similarity from ground-truth map. Adversarial learning forms the better generator network and discriminator network, and the gated fusion saliency map generated by the best generator network is served as final result. Experiments on five publicly RGB-D datasets demonstrate the effect of cross-modal fusion, depthwise separable residual convolution and adaptive gated fusion. Compared with the state-of-the-art methods, our method achieves the better performance. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:210 / 220
页数:11
相关论文
共 47 条
[1]  
Abadi M, 2016, Large-scale machine learning on heterogeneous systems
[2]  
[Anonymous], 2014, Ecole Polytechnique
[3]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[4]   Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060
[5]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[6]  
Chen YX, 2014, INTERNATIONAL CONFERENCE ON MECHANICS AND MATERIALS ENGINEERING (ICMME 2014), P23
[7]   Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation [J].
Cheng, Yanhua ;
Cai, Rui ;
Li, Zhiwei ;
Zhao, Xin ;
Huang, Kaiqi .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1475-1483
[8]   Xception: Deep Learning with Depthwise Separable Convolutions [J].
Chollet, Francois .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1800-1807
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Fan D., 2019, CORR