PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module

被引:0
作者
Xie, Liang [1 ,2 ]
Xiang, Chao [1 ]
Yu, Zhengxu [1 ]
Xu, Guodong [1 ,2 ]
Yang, Zheng [2 ]
Cai, Deng [1 ,3 ]
He, Xiaofei [1 ,2 ]
机构
[1] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Fabu Inc, Hangzhou, Peoples R China
[3] Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou, Peoples R China
来源
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird's Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.
引用
收藏
页码:12460 / 12467
页数:8
相关论文
共 26 条
[1]  
[Anonymous], 2019, ARXIV190108373
[2]  
[Anonymous], 2017, P CVPR
[3]  
Chen X., 2015, LECT NOTES BUS INF P, P424
[4]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[5]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]   NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction [J].
Gao, Yuan ;
Ma, Jiayi ;
Zhao, Mingbo ;
Liu, Wei ;
Yuille, Alan L. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3200-3209
[8]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[9]   Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [J].
Ku, Jason ;
Pon, Alex D. ;
Waslander, Steven L. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11859-11868
[10]   In Defense of Classical Image Processing: Fast Depth Completion on the CPU [J].
Ku, Jason ;
Harakeh, Ali ;
Waslander, Steven L. .
2018 15TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2018, :16-22