Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?

被引:56
作者
Harley, Adam W. [1 ]
Fang, Zhaoyuan [2 ]
Li, Jie [3 ]
Ambrus, Rares [3 ]
Fragkiadaki, Katerina [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[3] Toyota Res Inst, Los Altos, CA USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA | 2023年
关键词
VIEW;
D O I
10.1109/ICRA48891.2023.10160831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably "lifted" from the multi-camera images onto the 2D ground plane, yielding a "bird's eye view" (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel "lifting" methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance, while lifting strategies have a more modest effect-even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance, and invite the community to re-consider this commonly-neglected part of the sensor platform.
引用
收藏
页码:2759 / 2765
页数:7
相关论文
共 44 条
[1]  
[Anonymous], 2022, CVPR, DOI DOI 10.1109/CVPR52688.2022.01339
[2]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01528
[3]  
[Anonymous], 2020, ECCV
[4]  
Bewley A, 2016, IEEE IMAGE PROC, P3464, DOI 10.1109/ICIP.2016.7533003
[5]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[6]   Understanding Bird's-hye View of Road Semantics Using an Onboard Camera [J].
Can, Yigit Baran ;
Liniger, Alexander ;
Unal, Ozan ;
Paudel, Danda ;
Gool, Luc Van .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) :3302-3309
[7]  
Cheng Ricson, 2018, NEURIPS
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]  
Dmitry Ulyanov V. L., 2017, CVPR
[10]  
Gosala Nikhil, 2022, IEEE ROBOTICS AUTOMA