Multimodal vehicle detection: fusing 3D-LIDAR and color camera data

被引:152
作者
Asvadi, Alireza [1 ]
Garrote, Luis [1 ]
Premebida, Cristiano [1 ]
Peixoto, Paulo [1 ]
Nunes, Urbano J. [1 ]
机构
[1] Univ Coimbra, Inst Syst & Robot ISR UC, Dept Elect & Comp Engn DEEC, Coimbra, Portugal
关键词
Multimodal data; Deep learning; Object detection; Fusion;
D O I
10.1016/j.patrec.2017.09.038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most of the current successful object detection approaches are based on a class of deep learning models called Convolutional Neural Networks (ConvNets). While most existing object detection researches are focused on using ConvNets with color image data, emerging fields of application such as Autonomous Vehicles (AVs) which integrates a diverse set of sensors, require the processing for multisensor and multimodal information to provide a more comprehensive understanding of real-world environment. This paper proposes a multimodal vehicle detection system integrating data from a 3D-LIDAR and a color camera. Data from LIDAR and camera, in the form of three modalities, are the inputs of ConvNet-based detectors which are later combined to improve vehicle detection. The modalities are: (i) up-sampled representation of the sparse LIDAR's range data called dense-Depth Map (DM), (ii) high-resolution map from LIDAR's reflectance data hereinafter called Reflectance Map (RM), and (iii) RGB image from a monocular color camera calibrated wrt the LIDAR. Bounding Box (BB) detections in each one of these modalities are jointly learned and fused by an Artificial Neural Network (ANN) late-fusion strategy to improve the detection performance of each modality. The contribution of this paper is two-fold: (1) probing and evaluating 3D-LIDAR modalities for vehicle detection (specifically the depth and reflectance map modalities), and (2) joint learning and fusion of the independent ConvNet-based vehicle detectors (in each modality) using an ANN to obtain more accurate vehicle detection. The obtained results demonstrate that (1) DM and RM are very promising modalities for vehicle detection, and (2) experiments show that the proposed fusion strategy achieves higher accuracy compared to each modality alone in all the levels of increasing difficulty (easy, moderate, hard) in KITTI object detection dataset. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:20 / 29
页数:10
相关论文
共 41 条
[1]  
[Anonymous], 2014, INT C LEARN REPR
[2]  
Behley Jens., 2013, IROS
[3]   A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [J].
Cai, Zhaowei ;
Fan, Quanfu ;
Feris, Rogerio S. ;
Vasconcelos, Nuno .
COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :354-370
[4]  
Chen X., 2017, PROC CVPR IEEE, V1, P3, DOI [DOI 10.1109/CVPR.2017.691, 10.1109/CVPR.2017.691]
[5]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]  
Durrant-Whyte H, 2016, SPRINGER HANDBOOK OF ROBOTICS, P867
[8]  
Franzel T., OBJECT DETECTION MUL, P144
[9]   Sensor Fusion Methodology for Vehicle Detection [J].
Garcia, Fernando ;
Martin, David ;
de la Escalera, Arturo ;
Maria Armingol, Jose .
IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2017, 9 (01) :123-133
[10]  
Geiger A., 2012, C COMP VIS PATT REC