RGBD Salient Object Detection via Disentangled Cross-Modal Fusion

被引:79
作者
Chen, Hao [1 ]
Deng, Yongjian [2 ]
Li, Youfu [2 ]
Hung, Tzu-Yi [3 ]
Lin, Guosheng [4 ]
机构
[1] Nanyang Technol Univ, Delta NTU Corp Lab Cyber Phys Syst, Singapore 639798, Singapore
[2] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China
[3] Delta Res Ctr, Singapore, Singapore
[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
基金
新加坡国家研究基金会;
关键词
Image reconstruction; Feature extraction; Object detection; Topology; Image color analysis; Machine learning; Diversity reception; Disentangle; RGBD; saliency detection; NETWORK; ATTENTION; MODEL;
D O I
10.1109/TIP.2020.3014734
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth is beneficial for salient object detection (SOD) for its additional saliency cues. Existing RGBD SOD methods focus on tailoring complicated cross-modal fusion topologies, which although achieve encouraging performance, are with a high risk of over-fitting and ambiguous in studying cross-modal complementarity. Different from these conventional approaches combining cross-modal features entirely without differentiating, we concentrate our attention on decoupling the diverse cross-modal complements to simplify the fusion process and enhance the fusion sufficiency. We argue that if cross-modal heterogeneous representations can be disentangled explicitly, the cross-modal fusion process can hold less uncertainty, while enjoying better adaptability. To this end, we design a disentangled cross-modal fusion network to expose structural and content representations from both modalities by cross-modal reconstruction. For different scenes, the disentangled representations allow the fusion module to easily identify and incorporate desired complements for informative multi-modal fusion. Extensive experiments show the effectiveness of our designs and a large outperformance over state-of-the-art methods.
引用
收藏
页码:8407 / 8416
页数:10
相关论文
共 44 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]  
[Anonymous], 2015, PROC IEEE C COMPUT V
[3]  
[Anonymous], 2018, MULTIMED TOOLS APPL, DOI DOI 10.1007/S11042-018-5780-4
[4]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Jiang, Huaizu ;
Li, Jia .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722
[5]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[6]   Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060
[7]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]   RepFinder: Finding Approximately Repeated Scene Elements for Image Editing [J].
Cheng, Ming-Ming ;
Zhang, Fang-Lue ;
Mitra, Niloy J. ;
Huang, Xiaolei ;
Hu, Shi-Min .
ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04)
[10]   Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].
Cong, Runmin ;
Lei, Jianjun ;
Zhang, Changqing ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823