MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

被引:12
作者
Zhou, Feng [1 ]
Hu, Yong [2 ]
Shen, Xukun [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Beihang Univ, Sch New Media Art & Design, Beijing, Peoples R China
关键词
Deep learning; Object recognition; Adversarial network; Multimodal;
D O I
10.1007/s00371-018-1559-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.
引用
收藏
页码:1583 / 1594
页数:12
相关论文
共 50 条
  • [31] Learning Effective RGB-D Representations for Scene Recognition
    Song, Xinhang
    Jiang, Shuqiang
    Herranz, Luis
    Chen, Chengpeng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 980 - 993
  • [32] LIANet: Layer Interactive Attention Network for RGB-D Salient Object Detection
    Han, Yibo
    Wang, Liejun
    Du, Anyu
    Jiang, Shaochen
    [J]. IEEE ACCESS, 2022, 10 : 25435 - 25447
  • [33] RGB-D object recognition based on the joint deep random kernel convolution and ELM
    Yin, Yunhua
    Li, Huifang
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (11) : 4337 - 4346
  • [34] RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge
    Tang, Jinhui
    Jin, Lu
    Li, Zechao
    Gao, Shenghua
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1899 - 1908
  • [35] RGB-D object recognition based on the joint deep random kernel convolution and ELM
    Yunhua Yin
    Huifang Li
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 4337 - 4346
  • [36] Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning
    Aakerberg, Andreas
    Nasrollahi, Kamal
    Heder, Thomas
    [J]. PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
  • [37] Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
    Zhu, Jinchao
    Zhang, Xiaoyu
    Fang, Xian
    Dong, Feng
    Qiu, Yu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 359 - 363
  • [38] Multi-Model Convolutional Extreme Learning Machine with Kernel for RGB-D Object Recognition
    Yin, Yunhua
    Li, Huifang
    Wen, Xinling
    [J]. LIDAR IMAGING DETECTION AND TARGET RECOGNITION 2017, 2017, 10605
  • [39] Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
    Wang, Anran
    Lu, Jiwen
    Cai, Jianfei
    Cham, Tat-Jen
    Wang, Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1887 - 1898
  • [40] RGB-D Scene Labeling with Multimodal Recurrent Neural Networks
    Fan, Heng
    Mei, Xue
    Prokhorov, Danil
    Ling, Haibin
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 203 - 211