Is Pseudo-Lidar needed for Monocular 3D Object detection?

被引:134
作者
Park, Dennis [1 ]
Ambrus, Rares [1 ]
Guizilini, Vitor [1 ]
Li, Jie [1 ]
Gaidon, Adrien [1 ]
机构
[1] Toyota Res Inst, Cambridge, MA 02139 USA
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
关键词
D O I
10.1109/ICCV48922.2021.00313
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16:34% and 9:28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.
引用
收藏
页码:3122 / 3132
页数:11
相关论文
共 73 条
  • [71] Zhou K, 2019, PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), P1185, DOI [10.1109/ITNEC.2019.8729152, 10.1109/itnec.2019.8729152]
  • [72] Stereo Magnification: Learning view synthesis using multiplane images
    Zhou, Tinghui
    Tucker, Richard
    Flynn, John
    Fyffe, Graham
    Snavely, Noah
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04): : 1 - 12
  • [73] Zhou X., 2019, ARXIV