Camera and LiDAR-based point painted voxel region-based convolutional neural network for robust 3D object detection

被引:2
作者
Xie, Han [1 ]
Zheng, Wenqi [1 ]
Chen, Yunfan [2 ]
Shin, Hyunchul [1 ]
机构
[1] Hanyang Univ, Dept Elect & Elect Engn, Ansan, South Korea
[2] Hubei Univ Technol, Sch Elect & Elect Engn, Wuhan, Hubei, Peoples R China
关键词
three-dimensional object detection; LiDAR; fusion; computer vision; R-CNN;
D O I
10.1117/1.JEI.31.5.053025
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Most of the three-dimensional (3D) object detection methods based on LiDAR point cloud data achieve relatively high performance in general cases. However, when the LiDAR points have noise or some corruptions, the detection performance can be severely affected. We propose a 3D object detection method that combines point cloud information with two-dimensional (2D) semantic segmentation information to enhance the feature representation for difficult cases, such as sparse, noisy, and partially absent data. Motivated by the Pointpainting techniques, we designed an early-stage fusion method based on a Voxel region-based convolutional neural network (R-CNN) architecture. The 2D semantic segmentation scores obtained by the Pointpainting techniques are appended to the raw point cloud data. The voxel-based features and 2D semantic information improve the performance in detecting instances when the point cloud is corrupted. In addition, we also designed a multiscale hierarchical region of interest pooling strategy that reduced the computational cost of Voxel R-CNN by at least 43%. Our method shows competitive results with the state-of-the-art methods on the standard KITTI dataset. In addition, three corrupted KITTI datasets, KITTI sparse (KITTI-S), KITTI jittering (KITTI-J), and KITTI dropout (KITTI-D), were used for robustness testing. With the noisy LiDAR points, our proposed point painted Voxel R-CNN achieved superior detection performance over that of the baseline Voxel R-CNN for the moderate case, with a notable improvement of 11.13% in average precision (AP) on the 3D object detection and 14.3% in AP on the bird's eye view object detection. (c) 2022 SPIE and IS&T
引用
收藏
页数:13
相关论文
共 21 条
[1]  
Choi Y., 2020, arXiv
[2]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[3]   From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection [J].
Deng, Jiajun ;
Zhou, Wengang ;
Zhang, Yanyong ;
Li, Houqiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) :4722-4734
[4]  
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[5]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[6]   The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes [J].
Neuhold, Gerhard ;
Ollmann, Tobias ;
Bulo, Samuel Rota ;
Kontschieder, Peter .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :5000-5009
[7]  
O.D. Team., 2020, OPENPCDET OP SOURC T
[8]   CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection [J].
Pang, Su ;
Morris, Daniel ;
Radha, Hayder .
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :10386-10393
[9]  
Qi CR, 2017, ADV NEUR IN, V30
[10]   PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [J].
Qi, Charles R. ;
Su, Hao ;
Mo, Kaichun ;
Guibas, Leonidas J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :77-85