Is Pseudo-Lidar needed for Monocular 3D Object detection?

被引：134

作者：

Park, Dennis ^{[1
]}

Ambrus, Rares ^{[1
]}

Guizilini, Vitor ^{[1
]}

Li, Jie ^{[1
]}

Gaidon, Adrien ^{[1
]}

机构：

[1] Toyota Res Inst, Cambridge, MA 02139 USA

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00313

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress in 3D object detection from single images leverages monocular depth estimation as a way to produce 3D pointclouds, turning cameras into pseudo-lidar sensors. These two-stage detectors improve with the accuracy of the intermediate depth estimation network, which can itself be improved without manual labels via large-scale self-supervised learning. However, they tend to suffer from overfitting more than end-to-end methods, are more complex, and the gap with similar lidar-based detectors remains significant. In this work, we propose an end-to-end, single stage, monocular 3D object detector, DD3D, that can benefit from depth pre-training like pseudo-lidar methods, but without their limitations. Our architecture is designed for effective information transfer between depth estimation and 3D detection, allowing us to scale with the amount of unlabeled pre-training data. Our method achieves state-of-the-art results on two challenging benchmarks, with 16:34% and 9:28% AP for Cars and Pedestrians (respectively) on the KITTI-3D benchmark, and 41.5% mAP on NuScenes.

引用

页码：3122 / 3132

页数：11

共 73 条

[1] [Anonymous], P IEEE CVF C COMP VI
[2] [Anonymous], 2018, CVPR, DOI DOI 10.1109/CVPR.2018.00375
[3] [Anonymous], 2020, EUR C COMP VIS, DOI DOI 10.2112/SI103-066.1
[4] Ansari JA, 2018, IEEE INT C INT ROBOT, P8404, DOI 10.1109/IROS.2018.8593698
[5] Barabanau I., 2019, ARXIV190505618
[6] Brazil Garrick, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P135, DOI 10.1007/978-3-030-58592-1_9
[7] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
Brazil, Garrick
Liu, Xiaoming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9286 - 9295
[8] Caesar Holger, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11618, DOI 10.1109/CVPR42600.2020.01164
[9] Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image
Chabot, Florian
Chaouch, Mohamed
Rabarisoa, Jaonary
Teuliere, Celine
Chateau, Thierry
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1827 - 1836
[10] Chen X., 2015, ADV NEUR IN, P424

← 1 2 3 4 5 6 7 8 →