MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

被引:12
作者
Zhou, Feng [1 ]
Hu, Yong [2 ]
Shen, Xukun [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Beihang Univ, Sch New Media Art & Design, Beijing, Peoples R China
关键词
Deep learning; Object recognition; Adversarial network; Multimodal;
D O I
10.1007/s00371-018-1559-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.
引用
收藏
页码:1583 / 1594
页数:12
相关论文
共 50 条
[21]   Perception Subsystem for Object Recognition and Pose Estimation in RGB-D Images [J].
Kornuta, Tomasz ;
Laszkowski, Michal .
CHALLENGES IN AUTOMATION, ROBOTICS AND MEASUREMENT TECHNIQUES, 2016, 440 :597-607
[22]   RGB-D Object Recognition based on RGBD-PCANet Learning [J].
Sun, Shiying ;
Zhao, Xiaoguang ;
An, Ning ;
Tan, Min .
2017 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2017, :1075-1080
[23]   Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition [J].
Javed Imran ;
Balasubramanian Raman .
Journal of Ambient Intelligence and Humanized Computing, 2020, 11 :189-208
[24]   Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition [J].
Imran, Javed ;
Raman, Balasubramanian .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) :189-208
[25]   Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection [J].
Li, Gongyang ;
Liu, Zhi ;
Chen, Minyu ;
Bai, Zhen ;
Lin, Weisi ;
Ling, Haibin .
IEEE Transactions on Image Processing, 2021, 30 :3528-3542
[26]   Hybrid-Attention Network for RGB-D Salient Object Detection [J].
Chen, Yuzhen ;
Zhou, Wujie .
APPLIED SCIENCES-BASEL, 2020, 10 (17)
[27]   Adaptive Depth Enhancement Network for RGB-D Salient Object Detection [J].
Yi, Kang ;
Li, Yumeng ;
Tang, Haoran ;
Xu, Jing .
IEEE SIGNAL PROCESSING LETTERS, 2025, 32 :176-180
[28]   Uniform and Variational Deep Learning for RGB-D Object Recognition and Person Re-Identification [J].
Ren, Liangliang ;
Lu, Jiwen ;
Feng, Jianjiang ;
Zhou, Jie .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) :4970-4983
[29]   SAE-RNN Deep Learning for RGB-D Based Object Recognition [J].
Bai, Jing ;
Wu, Yan .
INTELLIGENT COMPUTING THEORY, 2014, 8588 :235-240
[30]   Semi-supervised learning and feature evaluation for RGB-D object recognition [J].
Cheng, Yanhua ;
Zhao, Xin ;
Huang, Kaiqi ;
Tan, Tieniu .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 139 :149-160