RGBD Salient Object Detection via Disentangled Cross-Modal Fusion

被引：79

作者：

Chen, Hao ^{[1
]}

Deng, Yongjian ^{[2
]}

Li, Youfu ^{[2
]}

Hung, Tzu-Yi ^{[3
]}

Lin, Guosheng ^{[4
]}

机构：

[1] Nanyang Technol Univ, Delta NTU Corp Lab Cyber Phys Syst, Singapore 639798, Singapore

[2] City Univ Hong Kong, Dept Mech Engn, Hong Kong, Peoples R China

[3] Delta Res Ctr, Singapore, Singapore

[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2020年 / 29卷 / 29期

基金：

新加坡国家研究基金会;

关键词：

Image reconstruction; Feature extraction; Object detection; Topology; Image color analysis; Machine learning; Diversity reception; Disentangle; RGBD; saliency detection; NETWORK; ATTENTION; MODEL;

D O I：

10.1109/TIP.2020.3014734

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Depth is beneficial for salient object detection (SOD) for its additional saliency cues. Existing RGBD SOD methods focus on tailoring complicated cross-modal fusion topologies, which although achieve encouraging performance, are with a high risk of over-fitting and ambiguous in studying cross-modal complementarity. Different from these conventional approaches combining cross-modal features entirely without differentiating, we concentrate our attention on decoupling the diverse cross-modal complements to simplify the fusion process and enhance the fusion sufficiency. We argue that if cross-modal heterogeneous representations can be disentangled explicitly, the cross-modal fusion process can hold less uncertainty, while enjoying better adaptability. To this end, we design a disentangled cross-modal fusion network to expose structural and content representations from both modalities by cross-modal reconstruction. For different scenes, the disentangled representations allow the fusion module to easily identify and incorporate desired complements for informative multi-modal fusion. Extensive experiments show the effectiveness of our designs and a large outperformance over state-of-the-art methods.

引用

页码：8407 / 8416

页数：10

共 44 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

[Anonymous], 2015, PROC IEEE C COMPUT V

[3]

[Anonymous], 2018, MULTIMED TOOLS APPL, DOI DOI 10.1007/S11042-018-5780-4

[4] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Cheng, Ming-Ming ;

Jiang, Huaizu ;

Li, Jia .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722

[5] Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835

[6] Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].

Chen, Hao ;

Li, Youfu .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060

[7] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].

Chen, Hao ;

Li, Youfu ;

Su, Dan .

PATTERN RECOGNITION, 2019, 86 :376-385

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9] RepFinder: Finding Approximately Repeated Scene Elements for Image Editing [J].

Cheng, Ming-Ming ;

Zhang, Fang-Lue ;

Mitra, Niloy J. ;

Huang, Xiaolei ;

Hu, Shi-Min .

ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (04)

[10] Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].

Cong, Runmin ;

Lei, Jianjun ;

Zhang, Changqing ;

Huang, Qingming ;

Cao, Xiaochun ;

Hou, Chunping .

IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823

← 1 2 3 4 5 →