Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection

被引:286
作者
Chen, Hao [1 ]
Li, Youfu [1 ]
Su, Dan [1 ]
机构
[1] City Univ Hong Kong, Dept Mech Engn, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, Peoples R China
关键词
RGB-D; Convolutional neural networks; Multi-path; Saliency detection; DETECTION MODEL; VIDEO;
D O I
10.1016/j.patcog.2018.08.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:376 / 385
页数:10
相关论文
共 49 条
[21]   Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].
Gupta, Saurabh ;
Girshick, Ross ;
Arbelaez, Pablo ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360
[22]   Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection A survey [J].
Han, Junwei ;
Zhang, Dingwen ;
Cheng, Gong ;
Liu, Nian ;
Xu, Dong .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :84-100
[23]   Background Prior-Based Salient Object Detection via Deep Reconstruction Residual [J].
Han, Junwei ;
Zhang, Dingwen ;
Hu, Xintao ;
Guo, Lei ;
Ren, Jinchang ;
Wu, Feng .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2015, 25 (08) :1309-1321
[24]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[25]   Human action recognition in RGB-D videos using motion sequence information and deep learning [J].
Ijjina, Earnest Paul ;
Chalavadi, Krishna Mohan .
PATTERN RECOGNITION, 2017, 72 :504-516
[26]   A model of saliency-based visual attention for rapid scene analysis [J].
Itti, L ;
Koch, C ;
Niebur, E .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259
[27]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[28]   Category-Independent Object-level Saliency Detection [J].
Jia, Yangqing ;
Han, Mei .
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :1761-1768
[29]  
Ju R, 2014, IEEE IMAGE PROC, P1115, DOI 10.1109/ICIP.2014.7025222
[30]  
Kong L, 2018, PATTERN RECOGNIT