Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

被引：56

作者：

Harley, Adam W. ^{[1
]}

Fang, Zhaoyuan ^{[2
]}

Li, Jie ^{[3
]}

Ambrus, Rares ^{[3
]}

Fragkiadaki, Katerina ^{[2
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[3] Toyota Res Inst, Los Altos, CA USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA | 2023年

关键词：

VIEW;

D O I：

10.1109/ICRA48891.2023.10160831

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect-even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.

引用

页码：2759 / 2765

页数：7

共 44 条

[1]

[Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.01339

[2]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01528

[3]

[Anonymous], 2020, ECCV

[4]

Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003

[5] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[6] Understanding Bird's-hye View of Road Semantics Using an Onboard Camera [J].

Can, Yigit Baran ;

Liniger, Alexander ;

Unal, Ozan ;

Paudel, Danda ;

Gool, Luc Van .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) :3302-3309

[7]

Cheng Ricson, 2018, NEURIPS

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Dmitry Ulyanov V. L., 2017, CVPR

[10]

Gosala Nikhil, 2022, IEEE ROBOTICS AUTOMA

← 1 2 3 4 5 →