MSANet: multimodal self-augmentation and adversarial network for RGB-D object recognition

被引:12
作者
Zhou, Feng [1 ]
Hu, Yong [2 ]
Shen, Xukun [1 ]
机构
[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China
[2] Beihang Univ, Sch New Media Art & Design, Beijing, Peoples R China
关键词
Deep learning; Object recognition; Adversarial network; Multimodal;
D O I
10.1007/s00371-018-1559-x
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper researches on the problem of object recognition using RGB-D data. Although deep convolutional neural networks have so far made progress in this area, they are still suffering a lot from lack of large-scale manually labeled RGB-D data. Labeling large-scale RGB-D dataset is a time-consuming and boring task. More importantly, such large-scale datasets often exist a long tail, and those hard positive examples of the tail can hardly be recognized. To solve these problems, we propose a multimodal self-augmentation and adversarial network (MSANet) for RGB-D object recognition, which can augment the data effectively at two levels while keeping the annotations. Toward the first level, series of transformations are leveraged to generate class-agnostic examples for each instance, which supports the training of our MSANet. Toward the second level, an adversarial network is proposed to generate class-specific hard positive examples while learning to classify them correctly to further improve the performance of our MSANet. Via the above schemes, the proposed approach wins the best results on several available RGB-D object recognition datasets, e.g., our experimental results indicate a 1.5% accuracy boost on benchmark Washington RGB-D object dataset compared with the current state of the art.
引用
收藏
页码:1583 / 1594
页数:12
相关论文
共 50 条
[41]   Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition [J].
Wang, Anran ;
Lu, Jiwen ;
Cai, Jianfei ;
Cham, Tat-Jen ;
Wang, Gang .
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) :1887-1898
[42]   RGB-D Scene Labeling with Multimodal Recurrent Neural Networks [J].
Fan, Heng ;
Mei, Xue ;
Prokhorov, Danil ;
Ling, Haibin .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :203-211
[43]   Recognition of Overlapped Objects using RGB-D Sensor [J].
Yukitoh, Mitsuhiro ;
Oka, Takaaki ;
Morimoto, Masakazu .
2017 6TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION & 2017 7TH INTERNATIONAL SYMPOSIUM IN COMPUTATIONAL MEDICAL AND HEALTH TECHNOLOGY (ICIEV-ISCMHT), 2017,
[44]   Joint Deep Learning for RGB-D Action Recognition [J].
Qin, Xiaolei ;
Ge, Yongxin ;
Zhan, Liuwei ;
Li, Guangrui ;
Huang, Sheng ;
Wang, Hongxing ;
Chen, Feiyu ;
Wang, Hongxing .
2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
[45]   A Bread Recognition System using RGB-D Sensor [J].
Morimoto, Masakazu ;
Higasa, Akira .
2015 4TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION ICIEV 15, 2015,
[46]   WGI-Net: A weighted group integration network for RGB-D salient object detection [J].
Yanliang Ge ;
Cong Zhang ;
Kang Wang ;
Ziqi Liu ;
Hongbo Bi .
Computational Visual Media, 2021, 7 :115-125
[47]   WGI-Net: A weighted group integration network for RGB-D salient object detection [J].
Ge, Yanliang ;
Zhang, Cong ;
Wang, Kang ;
Liu, Ziqi ;
Bi, Hongbo .
COMPUTATIONAL VISUAL MEDIA, 2021, 7 (01) :115-125
[48]   Visual Recognition in RGB Images and Videos by Learning from RGB-D Data [J].
Li, Wen ;
Chen, Lin ;
Xu, Dong ;
Van Gool, Luc .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (08) :2030-2036
[49]   Joint Object Affordance Reasoning and Segmentation in RGB-D Videos [J].
Thermos, Spyridon ;
Potamianos, Gerasimos ;
Daras, Petros .
IEEE ACCESS, 2021, 9 :89699-89713
[50]   A salient object detection algorithm based on RGB-D images [J].
Song, Can ;
Wu, Jin ;
Deng, Huiping ;
Zhu, Lei .
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, :1692-1697