Cross-Modal Attentional Context Learning for RGB-D Object Detection

被引:59
作者
Li, Guanbin [1 ]
Gan, Yukang [1 ]
Wu, Hejun [1 ]
Xiao, Nong [1 ]
Lin, Liang [1 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
RGB-D object detection; attentional context modeling; cross modal feature; convolutional neural network; RECOGNITION;
D O I
10.1109/TIP.2018.2878956
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing objects from simultaneously sensed photometric (RGB) and depth channels is a fundamental yet practical problem in many machine vision applications, such as robot grasping and autonomous driving. In this paper, we address this problem by developing a cross-modal attentional context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data. Compared to existing RGB-D object detection frameworks, our approach has several appealing properties. First, it consists of an attention-based global context model for exploiting adaptive contextual information and incorporating this information into a region-based CNN (e.g., fast RCNN) framework to achieve improved object detection performance. Second, our CMAC framework further contains a fine-grained object part attention module to harness multiple discriminative object parts inside each possible object region for superior local feature representation. While greatly improving the accuracy of RGB-D object detection, the effective cross-modal information fusion as well as attentional context modeling in our proposed model provide an interpretable visualization scheme. Experimental results demonstrate that the proposed method significantly improves upon the state of the art on all public benchmarks.
引用
收藏
页码:1591 / 1601
页数:11
相关论文
共 52 条
[1]  
[Anonymous], 2016, P ACM INT C MULTIMED
[2]  
[Anonymous], 2016, DEEP FEATURE BASED C
[3]  
[Anonymous], IEEE T NEURAL NETW L
[4]   Multiscale Combinatorial Grouping [J].
Arbelaez, Pablo ;
Pont-Tuset, Jordi ;
Barron, Jonathan T. ;
Marques, Ferran ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :328-335
[5]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[6]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[7]  
Blum M, 2012, IEEE INT CONF ROBOT, P1298, DOI 10.1109/ICRA.2012.6225188
[8]  
Bo LF, 2011, IEEE INT C INT ROBOT, P821, DOI 10.1109/IROS.2011.6048717
[9]  
Bo LF, 2011, PROC CVPR IEEE, P1729, DOI 10.1109/CVPR.2011.5995719
[10]  
Carbonetto P, 2004, LECT NOTES COMPUT SC, V3021, P350