RGB-D Scene Recognition via Spatial-Related Multi-Modal Feature Learning

被引：15

作者：

Xiong, Zhitong ^{[1
,2
]}

Yuan, Yuan ^{[1
]}

Wang, Qi ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Shaanxi, Peoples R China

[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Shaanxi, Peoples R China

来源：

IEEE ACCESS | 2019年 / 7卷

基金：

中国国家自然科学基金;

关键词：

RGB-D; scene recognition; global and local features; multi-modal feature learning;

D O I：

10.1109/ACCESS.2019.2932080

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

RGB-D image-based scene recognition has achieved significant performance improvement with the development of deep learning methods. While convolutional neural networks can learn high-semantic level features for object recognition, these methods still have limitations for RGB-D scene classification. One limitation is that how to learn better multi-modal features for the RGB-D scene recognition is still an open problem. Another limitation is that the scene images are usually not object-centric and with great spatial variability. Thus, vanilla full-image CNN features maybe not optimal for scene recognition. Considering these problems, in this paper, we propose a compact and effective framework for RGB-D scene recognition. Specifically, we make the following contributions: 1) A novel RGB-D scene recognition framework is proposed to explicitly learn the global modal-specific and local modal-consistent features simultaneously. Different from existing approaches, local CNN features are considered for the learning of modal-consistent representations; 2) key Feature Selection (KFS) module is designed, which can adaptively select important local features from the high-semantic level CNN feature maps. It is more efficient and effective than object detection and dense patch-sampling based methods, and; 3) a triplet correlation loss and a spatial-attention similarity loss are proposed for the training of KFS module. Under the supervision of the proposed loss functions, the network can learn import local features of two modalities with no need for extra annotations. Finally, by concatenating the global and local features together, the proposed framework can achieve new state-of-the-art scene recognition performance on the SUN RGB-D dataset and NYU Depth version 2 (NYUD v2) dataset.

引用

页码：106739 / 106747

页数：9

共 50 条

[31] Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection
Abraham S.E.
Kovoor B.C.
Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (04) : 2341 - 2359
[32] SpineDepth: A Multi-Modal Data Collection Approach for Automatic Labelling and Intraoperative Spinal Shape Reconstruction Based on RGB-D Data
Liebmann, Florentin
Stutz, Dominik
Suter, Daniel
Jecklin, Sascha
Snedeker, Jess G.
Farshad, Mazda
Furnstahl, Philipp
Esfandiari, Hooman
JOURNAL OF IMAGING, 2021, 7 (09)
[33] BCINet: Bilateral cross-modal interaction network for indoor scene understanding in RGB-D images
Zhou, Wujie
Yue, Yuchun
Fang, Meixin
Qian, Xiaohong
Yang, Rongwang
Yu, Lu
INFORMATION FUSION, 2023, 94 : 32 - 42
[34] RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion
Lv, Xiong
Jiang, Shu-Qiang
Herranz, Luis
Wang, Shuang
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (02) : 340 - 352
[35] RGB-D Hand-Held Object Recognition Based on Heterogeneous Feature Fusion
Xiong Lv
Shu-Qiang Jiang
Luis Herranz
Shuang Wang
Journal of Computer Science and Technology, 2015, 30 : 340 - 352
[36] A deep multimodal feature learning network for RGB-D salient object detection
Liang, Fangfang
Duan, Lijuan
Ma, Wei
Qiao, Yuanhua
Miao, Jun
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 92 (92)
[37] Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning
Aakerberg, Andreas
Nasrollahi, Kamal
Heder, Thomas
PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
[38] Aberrance-aware Gradient-sensitive Attentions for Scene Recognition with RGB-D Video
Song, Xinhang
Zhang, Sixian
Hua, Yuyun
Jiang, Shuqiang
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1286 - 1294
[39] Learning for classification of traffic-related object on RGB-D data
Xia, Yingjie
Shi, Xingmin
Zhao, Na
MULTIMEDIA SYSTEMS, 2017, 23 (01) : 129 - 138
[40] Learning for classification of traffic-related object on RGB-D data
Yingjie Xia
Xingmin Shi
Na Zhao
Multimedia Systems, 2017, 23 : 129 - 138

← 1 2 3 4 5 →