Fast-BEV: A Fast and Strong Birds-Eye View Perception Baseline

被引：7

作者：

Li, Yangguang ^{[1
]}

Huang, Bin ^{[2
]}

Chen, Zeren ^{[3
]}

Cui, Yufeng ^{[3
]}

Liang, Feng ^{[4
]}

Shen, Mingzhu ^{[3
]}

Liu, Fenggang ^{[1
]}

Xie, Enze ^{[5
]}

Sheng, Lu ^{[3
]}

Ouyang, Wanli ^{[6
]}

Shao, Jing ^{[6
]}

机构：

[1] SenseTime Res, Beijing 100080, Peoples R China

[2] Hozon New Energy Automobile Co Ltd, Shanghai 200062, Peoples R China

[3] Beihang Univ, Beijing 100191, Peoples R China

[4] Univ Texas Austin, Austin, TX 78712 USA

[5] Univ Hong Kong, Pok Fu Lam, Hong Kong, Peoples R China

[6] Shanghai AI Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Multi-camera; bird's-eye view (BEV) representation; autonomous driving;

D O I：

10.1109/TPAMI.2024.3414835

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation or depth representation. Our Fast-BEV consists of five parts, we innovatively propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image features to 3D voxel space, (2) a multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Among them, (1) and (3) enable Fast-BEV to be fast inference and deployment friendly on the on-vehicle chips, (2), (4) and (5) ensure that Fast-BEV has competitive performance. All these make Fast-BEV a solution with high performance, fast inference speed, and deployment-friendly on the on-vehicle chips of autonomous driving. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model (Li et al. 2022) and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model (J. Huang and G. Huang, 2022). Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips.

引用

页码：8665 / 8679

页数：15

共 42 条

[1] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
Brazil, Garrick
Liu, Xiaoming
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9286 - 9295
[2] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[3] Chen ZH, 2022, Arxiv, DOI arXiv:2204.11582
[4] Vision meets robotics: The KITTI dataset
Geiger, A.
Lenz, P.
Stiller, C.
Urtasun, R.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
[5] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[6] Huang J., 2022, arXiv
[7] Huang JJ, 2022, Arxiv, DOI arXiv:2203.17054
[8] Huang JJ, 2022, Arxiv, DOI arXiv:2112.11790
[9] Jiang YQ, 2022, Arxiv, DOI arXiv:2206.15398
[10] PointPillars: Fast Encoders for Object Detection from Point Clouds
Lang, Alex H.
Vora, Sourabh
Caesar, Holger
Zhou, Lubing
Yang, Jiong
Beijbom, Oscar
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697

← 1 2 3 4 5 →