RoIFusion: 3D Object Detection From LiDAR and Vision

被引:34
作者
Chen, Can [1 ]
Fragonara, Luca Zanotti [1 ]
Tsourdos, Antonios [1 ]
机构
[1] Cranfield Univ, Sch Aerosp Transport & Mfg, Cranfield MK43 0AL, Beds, England
来源
IEEE ACCESS | 2021年 / 9卷 / 09期
关键词
Three-dimensional displays; Feature extraction; Two dimensional displays; Object detection; Neural networks; Detectors; Sensor fusion; Sensors fusion; 3D object detection; Region of Interests; neural network; segmentation network; point cloud; image; NETWORK;
D O I
10.1109/ACCESS.2021.3070379
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When localizing and detecting 3D objects for autonomous driving scenes, obtaining information from multiple sensors (e.g., camera, LIDAR) is capable of mutually offering useful complementary information to enhance the robustness of 3D detectors. In this paper, a deep neural network architecture, named RoIFusion, is proposed to efficiently fuse the multi-modality features for 3D object detection by leveraging the advantages of LIDAR and camera sensors. In order to achieve this task, instead of densely combining the point-wise feature of the point cloud with the related pixel features, our fusion method novelly aggregates a small set of 3D Region of Interests (RoIs) in the point clouds with the corresponding 2D RoIs in the images, which are beneficial for reducing the computation cost and avoiding the viewpoint misalignment during the feature aggregation from different sensors. Finally, Extensive experiments are performed on the KITTI 3D object detection challenging benchmark to show the effectiveness of our fusion method and demonstrate that our deep fusion approach achieves state-of-the-art performance.
引用
收藏
页码:51710 / 51721
页数:12
相关论文
共 50 条
  • [11] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
  • [12] Hinton G, 2014, PROC NEURAL INF PROC, P1
  • [13] More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification
    Hong, Danfeng
    Gao, Lianru
    Yokoya, Naoto
    Yao, Jing
    Chanussot, Jocelyn
    Du, Qian
    Zhang, Bing
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (05): : 4340 - 4354
  • [14] X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data
    Hong, Danfeng
    Yokoya, Naoto
    Xia, Gui-Song
    Chanussot, Jocelyn
    Zhu, Xiao Xiang
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 167 : 12 - 23
  • [15] Learnable manifold alignment (LeMA): A semi-supervised cross-modality learning framework for land cover and land use classification
    Hong, Danfeng
    Yokoya, Naoto
    Ge, Nan
    Chanussot, Jocelyn
    Zhu, Xiao Xiang
    [J]. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2019, 147 : 193 - 205
  • [16] King DB, 2015, ACS SYM SER, V1214, P1
  • [17] Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049
  • [18] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697
  • [19] Lehner Johannes, 2019, ARXIV PREPRINT ARXIV
  • [20] Stereo R-CNN based 3D Object Detection for Autonomous Driving
    Li, Peiliang
    Chen, Xiaozhi
    Shen, Shaojie
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7636 - 7644