Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation

被引:87
作者
Sun, Jiaming [1 ,2 ]
Chen, Linghao [1 ]
Xie, Yiming [1 ]
Zhang, Siyu [3 ]
Jiang, Qinhong [2 ]
Zhou, Xiaowei [1 ]
Bao, Hujun [1 ]
机构
[1] Zhejiang Univ, State Key Lab CAD & CG, Hangzhou, Peoples R China
[2] SenseTime, Hong Kong, Peoples R China
[3] Southern Univ Sci & Technol, Shenzhen, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
D O I
10.1109/CVPR42600.2020.01056
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering a point cloud with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, even when LiDAR ground-truth is not available at training time, Disp R-CNN achieves competitive performance and outperforms previous state-of-the-art methods by 20% in terms of average precision. The code will be available at https://github.com/zju3dv/disprcnn.
引用
收藏
页码:10545 / 10554
页数:10
相关论文
共 34 条
[1]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[2]  
Chang A X, 2015, COMPUTER SCI, V1512, P3
[3]   Pyramid Stereo Matching Network [J].
Chang, Jia-Ren ;
Chen, Yong-Sheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418
[4]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[5]  
Chen XZ, 2015, ADV NEUR IN, V28
[6]   On Robustness of Neural Architecture Search Under Label Noise [J].
Chen, Yi-Wei ;
Song, Qingquan ;
Liu, Xi ;
Sastry, P. S. ;
Hu, Xia .
FRONTIERS IN BIG DATA, 2020, 3
[7]  
Cline, 1987, COMPUT GRAPH, V21, P163, DOI DOI 10.1145/37402.37422
[8]   Joint Object Pose Estimation and Shape Reconstruction in Urban Street Scenes Using 3D Shape Priors [J].
Engelmann, Francis ;
Stueckler, Joerg ;
Leibe, Bastian .
PATTERN RECOGNITION, GCPR 2016, 2016, 9796 :219-230
[9]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[10]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]