When CNNs meet random RNNs: Towards multi-level analysis for RGB-D object and scene recognition

被引:10
|
作者
Caglayan, Ali [1 ,3 ]
Imamoglu, Nevrez [1 ,3 ]
Can, Ahmet Burak [2 ]
Nakamura, Ryosuke [1 ,3 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Artificial Intelligence Res Ctr AIRC, Tokyo, Japan
[2] Hacettepe Univ, Dept Comp Engn, Ankara, Turkey
[3] Natl Inst Adv Ind Sci & Technol, Digital Architecture Res Ctr, Tokyo, Japan
关键词
Convolutional Neural Networks; Randomized neural networks; Transfer learning; RGB-D object recognition; RGB-D scene recognition; FUSION NETWORK; APPROXIMATION; FEATURES;
D O I
10.1016/j.cviu.2022.103373
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing objects and scenes are two challenging but essential tasks in image understanding. In particular, the use of RGB-D sensors in handling these tasks has emerged as an important area of focus for better visual understanding. Meanwhile, deep neural networks, specifically convolutional neural networks (CNNs), have become widespread and have been applied to many visual tasks by replacing hand-crafted features with effective deep features. However, it is an open problem how to exploit deep features from a multi-layer CNN model effectively. In this paper, we propose a novel two-stage framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. In the first stage, a pretrained CNN model has been employed as a backbone to extract visual features at multiple levels. The second stage maps these features into high level representations with a fully randomized structure of recursive neural networks (RNNs) efficiently. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed by extending the idea of randomness in RNNs. Multi modal fusion has been performed through a soft voting approach by computing weights based on individual recognition confidences (i.e. SVM scores) of RGB and depth streams separately. This produces consistent class label estimation in final RGB-D classification performance. Extensive experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully. Comparative experimental results on the popular Washington RGB-D Object and SUN RGB-D Scene datasets show that the proposed approach achieves superior or on-par performance compared to state-of-the-art methods both in object and scene recognition tasks. Code is available at https://github.com/acaglayan/CNN_randRNN.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] RGB-D Scene Recognition with Object-to-Object Relation
    Song, Xinhang
    Chen, Chengpeng
    Jiang, Shuqiang
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 600 - 608
  • [2] RGB-D Scene Recognition based on Object-Scene Relation
    Guo, Yuhui
    Liang, Xun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15787 - 15788
  • [3] RGB-D Object Discovery via Multi-Scene Analysis
    Herbst, Evan
    Ren, Xiaofeng
    Fox, Dieter
    2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011,
  • [4] Contextual object category recognition for RGB-D scene labeling
    Ali, Haider
    Shafait, Faisal
    Giannakidou, Eirini
    Vakali, Athena
    Figueroa, Nadia
    Varvadoukas, Theodoros
    Mavridis, Nikolaos
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2014, 62 (02) : 241 - 256
  • [5] Depth CNNs for RGB-D Scene Recognition: Learning from Scratch Better than Transferring from RGB-CNNs
    Song, Xinhang
    Herranz, Luis
    Jiang, Shugiang
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4271 - 4277
  • [6] Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition
    Song, Xinhang
    Jiang, Shuqiang
    Wang, Bohan
    Chen, Chengpeng
    Chen, Gongwei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 525 - 537
  • [7] Multi-level progressive parallel attention guided salient object detection for RGB-D images
    Liu, Zhengyi
    Duan, Quntao
    Shi, Song
    Zhao, Peng
    VISUAL COMPUTER, 2021, 37 (03): : 529 - 540
  • [8] Multi-level progressive parallel attention guided salient object detection for RGB-D images
    Zhengyi Liu
    Quntao Duan
    Song Shi
    Peng Zhao
    The Visual Computer, 2021, 37 : 529 - 540
  • [9] Multi-level cross-modal interaction network for RGB-D salient object detection
    Huang, Zhou
    Chen, Huai-Xin
    Zhou, Tao
    Yang, Yun-Zhi
    Liu, Bi-Yuan
    NEUROCOMPUTING, 2021, 452 : 200 - 211
  • [10] Multi-Modal RGB-D Scene Recognition Across Domains
    Ferreri, Andrea
    Bucci, Silvia
    Tommasi, Tatiana
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2199 - 2208