MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

被引:19
作者
Chen, Ke [1 ]
Oldja, Ryan [1 ]
Smolyanskiy, Nikolai [1 ]
Birchfield, Stan [1 ]
Popov, Alexander [1 ]
Wehr, David [1 ]
Eden, Ibrahim [1 ]
Pehserl, Joachim [1 ]
机构
[1] NVIDIA, Santa Clara, CA 95051 USA
来源
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2020年
关键词
D O I
10.1109/IROS45743.2020.9341450
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird's eye view, to detect and classify objects. Both stages use an encoder-decoder architecture. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.(1)
引用
收藏
页码:2288 / 2294
页数:7
相关论文
共 24 条
  • [1] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
    Behley, Jens
    Garbade, Martin
    Milioto, Andres
    Quenzel, Jan
    Behnke, Sven
    Stachniss, Cyrill
    Gall, Juergen
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9296 - 9306
  • [2] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [3] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
    Dai, Angela
    Qi, Charles Ruizhongtai
    Niessner, Matthias
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6545 - 6554
  • [4] Ester M., 1996, KDD-96 Proceedings. Second International Conference on Knowledge Discovery and Data Mining, P226
  • [5] Vision meets robotics: The KITTI dataset
    Geiger, A.
    Lenz, P.
    Stiller, C.
    Urtasun, R.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
  • [6] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
  • [7] Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049
  • [8] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697
  • [9] Multi-Task Multi-Sensor Fusion for 3D Object Detection
    Liang, Ming
    Yang, Bin
    Chen, Yun
    Hu, Rui
    Urtasun, Raquel
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7337 - 7345
  • [10] Deep Continuous Fusion for Multi-sensor 3D Object Detection
    Liang, Ming
    Yang, Bin
    Wang, Shenlong
    Urtasun, Raquel
    [J]. COMPUTER VISION - ECCV 2018, PT XVI, 2018, 11220 : 663 - 678