MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

被引：12

作者：

Zhou, Feng ^{[1
]}

Hu, Yong ^{[2
]}

Shen, Xukun ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China

[2] Beihang Univ, Sch New Media Art & Design, Beijing, Peoples R China

来源：

VISUAL COMPUTER | 2019年 / 35卷 / 11期

关键词：

Deep learning; Object recognition; Adversarial network; Multimodal;

D O I：

10.1007/s00371-018-1559-x

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.

引用

页码：1583 / 1594

页数：12

共 50 条

[31] Learning Effective RGB-D Representations for Scene Recognition
Song, Xinhang
Jiang, Shuqiang
Herranz, Luis
Chen, Chengpeng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 980 - 993
[32] LIANet: Layer Interactive Attention Network for RGB-D Salient Object Detection
Han, Yibo
Wang, Liejun
Du, Anyu
Jiang, Shaochen
[J]. IEEE ACCESS, 2022, 10 : 25435 - 25447
[33] RGB-D object recognition based on the joint deep random kernel convolution and ELM
Yin, Yunhua
Li, Huifang
[J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (11) : 4337 - 4346
[34] RGB-D Object Recognition via Incorporating Latent Data Structure and Prior Knowledge
Tang, Jinhui
Jin, Lu
Li, Zechao
Gao, Shenghua
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1899 - 1908
[35] RGB-D object recognition based on the joint deep random kernel convolution and ELM
Yunhua Yin
Huifang Li
[J]. Journal of Ambient Intelligence and Humanized Computing, 2020, 11 : 4337 - 4346
[36] Improving a Deep Learning based RGB-D Object Recognition Model by Ensemble Learning
Aakerberg, Andreas
Nasrollahi, Kamal
Heder, Thomas
[J]. PROCEEDINGS OF THE 2017 SEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA 2017), 2017,
[37] Modal-Adaptive Gated Recoding Network for RGB-D Salient Object Detection
Zhu, Jinchao
Zhang, Xiaoyu
Fang, Xian
Dong, Feng
Qiu, Yu
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 359 - 363
[38] Multi-Model Convolutional Extreme Learning Machine with Kernel for RGB-D Object Recognition
Yin, Yunhua
Li, Huifang
Wen, Xinling
[J]. LIDAR IMAGING DETECTION AND TARGET RECOGNITION 2017, 2017, 10605
[39] Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
Wang, Anran
Lu, Jiwen
Cai, Jianfei
Cham, Tat-Jen
Wang, Gang
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1887 - 1898
[40] RGB-D Scene Labeling with Multimodal Recurrent Neural Networks
Fan, Heng
Mei, Xue
Prokhorov, Danil
Ling, Haibin
[J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 203 - 211

← 1 2 3 4 5 →