Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引:1
作者
Xiao, Yun [1 ,2 ,4 ]
Huang, Yameng [3 ]
Li, Chenglong [1 ,2 ,4 ]
Liu, Lei [3 ]
Zhou, Aiwu [3 ]
Tang, Jin [3 ]
机构
[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;
D O I
10.1007/s12559-023-10148-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.
引用
收藏
页码:1868 / 1883
页数:16
相关论文
共 70 条
  • [1] Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
  • [2] Bhat Goutam, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12347), P777, DOI 10.1007/978-3-030-58536-5_46
  • [3] Salient Object Detection: A Benchmark
    Borji, Ali
    Cheng, Ming-Ming
    Jiang, Huaizu
    Li, Jia
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5706 - 5722
  • [4] CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse
    Chen, Hao
    Li, Youfu
    Deng, Yongjian
    Lin, Guosheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (07) : 2076 - 2096
  • [5] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
    Chen, Hao
    Li, Youfu
    Su, Dan
    [J]. PATTERN RECOGNITION, 2019, 86 : 376 - 385
  • [6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [7] Reverse Attention for Salient Object Detection
    Chen, Shuhan
    Tan, Xiuli
    Wang, Ben
    Hu, Xuelong
    [J]. COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 236 - 252
  • [8] DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection
    Chen, Zuyao
    Cong, Runmin
    Xu, Qianqian
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7012 - 7024
  • [9] Chongyi Li, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12353), P225, DOI 10.1007/978-3-030-58598-3_14
  • [10] Deng ZJ, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P684