Is Pseudo-Lidar needed for Monocular 3D Object detection?

被引：134

作者：

Park, Dennis ^{[1
]}

Ambrus, Rares ^{[1
]}

Guizilini, Vitor ^{[1
]}

Li, Jie ^{[1
]}

Gaidon, Adrien ^{[1
]}

机构：

[1] Toyota Res Inst, Cambridge, MA 02139 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00313

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16:34% and 9:28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.

引用

页码：3122 / 3132

页数：11

共 73 条

[71] Zhou K, 2019, PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), P1185, DOI [10.1109/ITNEC.2019.8729152, 10.1109/itnec.2019.8729152]
[72] Stereo Magnification: Learning view synthesis using multiplane images
Zhou, Tinghui
Tucker, Richard
Flynn, John
Fyffe, Graham
Snavely, Noah
[J]. ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04): : 1 - 12
[73] Zhou X., 2019, ARXIV

← 1 2 3 4 5 6 7 8 →