MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

被引:35
作者
Liao, Guibiao [1 ]
Gao, Wei [1 ]
Jiang, Qiuping [2 ]
Wang, Ronggang [1 ]
Li, Ge [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Ningbo Univ, Ningbo, Zhejiang, Peoples R China
来源
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年
关键词
Salient object detection; RGB-D image; cross-modal guided attention; adversarial combination; IMAGE; SEGMENTATION;
D O I
10.1145/3394171.3413523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.
引用
收藏
页码:2436 / 2444
页数:9
相关论文
共 41 条
[1]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[2]  
[Anonymous], 2016, Proceedings of the IEEE conference on computer vision and pattern recognition, DOI DOI 10.1109/CVPR.2016.257
[3]  
[Anonymous], 2020, P IEEE C COMP VIS PA, DOI DOI 10.1109/TFUZZ.2019.2930492
[4]   Salient Object Detection: A Benchmark [J].
Borji, Ali ;
Cheng, Ming-Ming ;
Jiang, Huaizu ;
Li, Jia .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) :5706-5722
[5]   Three-Stream Attention-Aware Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (06) :2825-2835
[6]   Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection [J].
Chen, Hao ;
Li, Youfu .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3051-3060
[7]   Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection [J].
Chen, Hao ;
Li, Youfu ;
Su, Dan .
PATTERN RECOGNITION, 2019, 86 :376-385
[8]   Saliency Detection for Stereoscopic Images Based on Depth Confidence Analysis and Multiple Cues Fusion [J].
Cong, Runmin ;
Lei, Jianjun ;
Zhang, Changqing ;
Huang, Qingming ;
Cao, Xiaochun ;
Hou, Chunping .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (06) :819-823
[9]   Going From RGB to RGBD Saliency: A Depth-Guided Transformation Model [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Hou, Junhui ;
Huang, Qingming ;
Kwong, Sam .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (08) :3627-3639
[10]   Review of Visual Saliency Detection With Comprehensive Information [J].
Cong, Runmin ;
Lei, Jianjun ;
Fu, Huazhu ;
Cheng, Ming-Ming ;
Lin, Weisi ;
Huang, Qingming .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) :2941-2959