Fast-BEV: A Fast and Strong Birds-Eye View Perception Baseline

被引：7

作者：

Li, Yangguang ^{[1
]}

Huang, Bin ^{[2
]}

Chen, Zeren ^{[3
]}

Cui, Yufeng ^{[3
]}

Liang, Feng ^{[4
]}

Shen, Mingzhu ^{[3
]}

Liu, Fenggang ^{[1
]}

Xie, Enze ^{[5
]}

Sheng, Lu ^{[3
]}

Ouyang, Wanli ^{[6
]}

Shao, Jing ^{[6
]}

机构：

[1] SenseTime Res, Beijing 100080, Peoples R China

[2] Hozon New Energy Automobile Co Ltd, Shanghai 200062, Peoples R China

[3] Beihang Univ, Beijing 100191, Peoples R China

[4] Univ Texas Austin, Austin, TX 78712 USA

[5] Univ Hong Kong, Pok Fu Lam, Hong Kong, Peoples R China

[6] Shanghai AI Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Multi-camera; bird's-eye view (BEV) representation; autonomous driving;

D O I：

10.1109/TPAMI.2024.3414835

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation or depth representation. Our Fast-BEV consists of five parts, we innovatively propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image features to 3D voxel space, (2) a multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Among them, (1) and (3) enable Fast-BEV to be fast inference and deployment friendly on the on-vehicle chips, (2), (4) and (5) ensure that Fast-BEV has competitive performance. All these make Fast-BEV a solution with high performance, fast inference speed, and deployment-friendly on the on-vehicle chips of autonomous driving. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model (Li et al. 2022) and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model (J. Huang and G. Huang, 2022). Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips.

引用

页码：8665 / 8679

页数：15

共 42 条

[21] neuvition, 2022, LiDAR price for cars-neuvition: Solid-state LiDAR, LiDAR sensor suppliers, LiDAR technology, LiDAR sensor
[22] Is Pseudo-Lidar needed for Monocular 3D Object detection?
Park, Dennis
Ambrus, Rares
Guizilini, Vitor
Li, Jie
Gaidon, Adrien
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3122 - 3132
[23] Philion Jonah, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12359), P194, DOI 10.1007/978-3-030-58568-6_12
[24] Categorical Depth Distribution Network for Monocular 3D Object Detection
Reading, Cody
Harakeh, Ali
Chae, Julia
Waslander, Steven L.
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8551 - 8560
[25] Ren S., 2015, P ADV NEUR INF PROC, P3122
[26] Roddick T, 2018, Arxiv, DOI arXiv:1811.08188
[27] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
Rukhovich, Danila
Vorontsova, Anna
Konushin, Anton
[J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1265 - 1274
[28] Saha A., 2021, arXiv
[29] PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud
Shi, Shaoshuai
Wang, Xiaogang
Li, Hongsheng
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 770 - 779
[30] Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Sun, Pei
Kretzschmar, Henrik
Dotiwalla, Xerxes
Chouard, Aurelien
Patnaik, Vijaysai
Tsui, Paul
Guo, James
Zhou, Yin
Chai, Yuning
Caine, Benjamin
Vasudevan, Vijay
Han, Wei
Ngiam, Jiquan
Zhao, Hang
Timofeev, Aleksei
Ettinger, Scott
Krivokon, Maxim
Gao, Amy
Joshi, Aditya
Zhang, Yu
Shlens, Jonathon
Chen, Zhifeng
Anguelov, Dragomir
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2443 - 2451

← 1 2 3 4 5 →