Is Pseudo-Lidar needed for Monocular 3D Object detection?

被引:134
作者
Park, Dennis [1 ]
Ambrus, Rares [1 ]
Guizilini, Vitor [1 ]
Li, Jie [1 ]
Gaidon, Adrien [1 ]
机构
[1] Toyota Res Inst, Cambridge, MA 02139 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00313
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16:34% and 9:28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.
引用
收藏
页码:3122 / 3132
页数:11
相关论文
共 73 条
  • [1] [Anonymous], P IEEE CVF C COMP VI
  • [2] [Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00375
  • [3] [Anonymous], 2020, EUR C COMP VIS, DOI DOI 10.2112/SI103-066.1
  • [4] Ansari JA, 2018, IEEE INT C INT ROBOT, P8404, DOI 10.1109/IROS.2018.8593698
  • [5] Barabanau I., 2019, ARXIV190505618
  • [6] Brazil Garrick, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P135, DOI 10.1007/978-3-030-58592-1_9
  • [7] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
    Brazil, Garrick
    Liu, Xiaoming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9286 - 9295
  • [8] Caesar Holger, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [9] Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image
    Chabot, Florian
    Chaouch, Mohamed
    Rabarisoa, Jaonary
    Teuliere, Celine
    Chateau, Thierry
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1827 - 1836
  • [10] Chen X., 2015, ADV NEUR IN, P424