RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

被引:15
|
作者
Xiong, Zhitong [1 ,2 ]
Yuan, Yuan [1 ]
Wang, Qi [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D; scene recognition; global and local features; multi-modal feature learning;
D O I
10.1109/ACCESS.2019.2932080
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.
引用
收藏
页码:106739 / 106747
页数:9
相关论文
共 50 条
  • [31] Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection
    Abraham S.E.
    Kovoor B.C.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (04) : 2341 - 2359
  • [32] SpineDepth: A Multi-Modal Data Collection Approach for Automatic Labelling and Intraoperative Spinal Shape Reconstruction Based on RGB-D Data
    Liebmann, Florentin
    Stutz, Dominik
    Suter, Daniel
    Jecklin, Sascha
    Snedeker, Jess G.
    Farshad, Mazda
    Furnstahl, Philipp
    Esfandiari, Hooman
    JOURNAL OF IMAGING, 2021, 7 (09)
  • [33] BCINet: Bilateral cross-modal interaction network for indoor scene understanding in RGB-D images
    Zhou, Wujie
    Yue, Yuchun
    Fang, Meixin
    Qian, Xiaohong
    Yang, Rongwang
    Yu, Lu
    INFORMATION FUSION, 2023, 94 : 32 - 42
  • [34] RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion
    Lv, Xiong
    Jiang, Shu-Qiang
    Herranz, Luis
    Wang, Shuang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (02) : 340 - 352
  • [35] RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion
    Xiong Lv
    Shu-Qiang Jiang
    Luis Herranz
    Shuang Wang
    Journal of Computer Science and Technology, 2015, 30 : 340 - 352
  • [36] A deep multimodal feature learning network for RGB-D salient object detection
    Liang, Fangfang
    Duan, Lijuan
    Ma, Wei
    Qiao, Yuanhua
    Miao, Jun
    COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92 (92)
  • [37] Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning
    Aakerberg, Andreas
    Nasrollahi, Kamal
    Heder, Thomas
    PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
  • [38] Aberrance-aware Gradient-sensitive Attentions for Scene Recognition with RGB-D Video
    Song, Xinhang
    Zhang, Sixian
    Hua, Yuyun
    Jiang, Shuqiang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1286 - 1294
  • [39] Learning for classification of traffic-related object on RGB-D data
    Xia, Yingjie
    Shi, Xingmin
    Zhao, Na
    MULTIMEDIA SYSTEMS, 2017, 23 (01) : 129 - 138
  • [40] Learning for classification of traffic-related object on RGB-D data
    Yingjie Xia
    Xingmin Shi
    Na Zhao
    Multimedia Systems, 2017, 23 : 129 - 138