Fast-BEV: A Fast and Strong Birds-Eye View Perception Baseline

被引:7
作者
Li, Yangguang [1 ]
Huang, Bin [2 ]
Chen, Zeren [3 ]
Cui, Yufeng [3 ]
Liang, Feng [4 ]
Shen, Mingzhu [3 ]
Liu, Fenggang [1 ]
Xie, Enze [5 ]
Sheng, Lu [3 ]
Ouyang, Wanli [6 ]
Shao, Jing [6 ]
机构
[1] SenseTime Res, Beijing 100080, Peoples R China
[2] Hozon New Energy Automobile Co Ltd, Shanghai 200062, Peoples R China
[3] Beihang Univ, Beijing 100191, Peoples R China
[4] Univ Texas Austin, Austin, TX 78712 USA
[5] Univ Hong Kong, Pok Fu Lam, Hong Kong, Peoples R China
[6] Shanghai AI Lab, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-camera; bird's-eye view (BEV) representation; autonomous driving;
D O I
10.1109/TPAMI.2024.3414835
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation or depth representation. Our Fast-BEV consists of five parts, we innovatively propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image features to 3D voxel space, (2) a multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Among them, (1) and (3) enable Fast-BEV to be fast inference and deployment friendly on the on-vehicle chips, (2), (4) and (5) ensure that Fast-BEV has competitive performance. All these make Fast-BEV a solution with high performance, fast inference speed, and deployment-friendly on the on-vehicle chips of autonomous driving. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model (Li et al. 2022) and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model (J. Huang and G. Huang, 2022). Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips.
引用
收藏
页码:8665 / 8679
页数:15
相关论文
共 42 条
  • [1] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
    Brazil, Garrick
    Liu, Xiaoming
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9286 - 9295
  • [2] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [3] Chen ZH, 2022, Arxiv, DOI arXiv:2204.11582
  • [4] Vision meets robotics: The KITTI dataset
    Geiger, A.
    Lenz, P.
    Stiller, C.
    Urtasun, R.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
  • [5] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [6] Huang J., 2022, arXiv
  • [7] Huang JJ, 2022, Arxiv, DOI arXiv:2203.17054
  • [8] Huang JJ, 2022, Arxiv, DOI arXiv:2112.11790
  • [9] Jiang YQ, 2022, Arxiv, DOI arXiv:2206.15398
  • [10] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697