3D Vehicle Detection Using Multi-Level Fusion From Point Clouds and Images

被引:26
作者
Zhao, Kun [1 ,2 ]
Ma, Lingfei [3 ,4 ]
Meng, Yu [1 ]
Liu, Li [1 ]
Wang, Junbo [2 ]
Marcato, Jose, Jr. [5 ]
Goncalves, Wesley Nunes [5 ,6 ]
Li, Jonathan [2 ]
机构
[1] Univ Sci & Technol Beijing, Coll Mech Engn, Beijing 100083, Peoples R China
[2] Univ Waterloo, Dept Geog & Environm Management, Waterloo, ON N2L 3G1, Canada
[3] Cent Univ Finance & Econ, Engn Res Ctr State Financial Secur, Minist Educ, Beijing 102206, Peoples R China
[4] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China
[5] Univ Fed Mato Grosso do Sul, Fac Engn Architecture & Urbanism & Geog, BR-79070900 Campo Grande, MS, Brazil
[6] Univ Fed Mato Grosso do Sul, Fac Comp Sci, BR-79070900 Campo Grande, MS, Brazil
基金
中国国家自然科学基金;
关键词
Point cloud compression; Feature extraction; Three-dimensional displays; Detectors; Proposals; Electronic mail; Shape; 3D vehicle detection; deep learning; autonomous driving; false detection; point cloud processing; data fusion; OBJECT DETECTION;
D O I
10.1109/TITS.2021.3137392
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
3D vehicle detectors based on point clouds generally have higher detection performance than detectors based on multi-sensors. However, with the lack of texture information, point-based methods get many missing detection of occluded and distant vehicles, and false detection with high-confidence of similarly shaped objects, which is a potential threat to traffic safety. Therefore, in the long run, fusion-based methods have more potential. This paper presents a multi-level fusion network for 3D vehicle detection from point clouds and images. The fusion network includes three stages: data-level fusion of point clouds and images, feature-level fusion of voxel and Bird's Eye View (BEV) in the point cloud branch, and feature-level fusion of point clouds and images. Besides, a novel coarse-fine detection header is proposed, which simulates the two-stage detectors, generating coarse proposals on the encoder, and refining them on the decoder. Extensive experiments show that the proposed network has better detection performance on occluded and distant vehicles, and reduces the false detection of similarly shaped objects, proving its superiority over some state-of-the-art detectors on the challenging KITTI benchmark. Ablation studies have also demonstrated the effectiveness of each designed module.
引用
收藏
页码:15146 / 15154
页数:9
相关论文
共 39 条
[1]  
Ali W, 2018, P EUR C COMP VIS ECC, P0
[2]   A Survey on 3D Object Detection Methods for Autonomous Driving Applications [J].
Arnold, Eduardo ;
Al-Jarrah, Omar Y. ;
Dianati, Mehrdad ;
Fallah, Saber ;
Oxtoby, David ;
Mouzakitis, Alex .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) :3782-3795
[3]  
Beltrán J, 2018, IEEE INT C INTELL TR, P3517, DOI 10.1109/ITSC.2018.8569311
[4]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[5]  
Chen X., 2017, PROC CVPR IEEE, V1, P3, DOI DOI 10.1109/CVPR.2017.691
[6]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[7]  
Engelcke Martin, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1355, DOI 10.1109/ICRA.2017.7989161
[8]  
Geiger A., 2012, C COMP VIS PATT REC
[9]   EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection [J].
Huang, Tengteng ;
Liu, Zhe ;
Chen, Xiwu ;
Bai, Xiang .
COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :35-52
[10]  
Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049