Bird′s eye view generation based on recurrent cross-view transformation and multi-state feature fusion

被引：0

作者：

Liu, Mingjie ^{[1
]}

He, Zhengyan ^{[1
]}

Chen, Junsheng ^{[1
]}

Liu, Ping ^{[1
]}

Piao, Changhao ^{[1
]}

机构：

[1] School of Automation, Chongqing University of Posts and Telecommunications, Chongqing

来源：

Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument | 2024年 / 45卷 / 10期

关键词：

bird′s eye view; light-weight Transformer; map-view transition; perspective view;

D O I：

10.19650/j.cnki.cjsi.J2412990

中图分类号：

学科分类号：

摘要：

To address semantic inconsistency in multi-state associated feature extraction and balancing model performance with complexity in most multiple perspective view-based bird′s eye view (BEV) generation method, a light-weight Transformer-based BEV generation model is proposed. The method utilizes an end-to-end one-stage training strategy to establish a mutual association between dynamic vehicle and static road information in traffic scenes, effectively filtering out noise in the generated BEV. A Transformer-based recurrent cross-view transformation module for multi-scale features is introduced to perform image encoding and representation learning. This module improves the robustness of the extracted BEV features by capturing the location-dependent relationships in the perspective view (PV) feature sequence. Additionally, a multi-state BEV feature fusion module is designed to address semantic inconsistencies, extracting correlated information between dynamic vehicles and static roads, thus enhancing the performance of the generated BEVs. Experiments on the NuScenes dataset show that this method achieves advanced BEV generation performance with low model complexity, achieving 43.2% and 82.0% semantic segmentation accuracy for dynamic vehicles and static roads, respectively. © 2024 Science Press. All rights reserved.

引用

页码：133 / 142

页数：9

共 25 条

[1]

NG M H, RADIA K, CHEN J F, Et al., Bev-seg: Bird′s eye view semantic segmentation using geometry and semantic point cloud [J], (2020)

[2]

JING Y F, TAO CH B., Fusion information enhanced method based on transformer for 3D object detection, Chinese Journal of Scientific Instrument, 44, 12, pp. 297-306, (2023)

[3]

VORA S, LANG A H, HELOU B, Et al., Pointpainting: Sequential fusion for 3d object detection, IEEE Conference on Computer Vision and Pattern Recognition, pp. 4604-4612, (2020)

[4]

JIANG Y Q, ZHANG L, MIAO ZH W, Et al., Polarformer: Multi-camera 3d object detection with polar transformer, AAAI Conference on Artificial Intelligence, 37, 1, pp. 1042-1050, (2023)

[5]

CHEN SH Y, CHENG T H, WANG X G, Et al., Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer [J], (2022)

[6]

LIU ZH J, TANG H T, AMINI A, Et al., Bevfusion: Multi-task multi-sensor fusion with unified bird′s-eye view representation [C], IEEE International Conference on Robotics and Automation, pp. 2774-2781, (2023)

[7]

ABBAS S A, ZISSERMAN A., A geometric approach to obtain a bird′ s eye view from an image, IEEE Conference on International Conference on Computer Vision, pp. 4095-4104, (2019)

[8]

PHILION J, FIDLER S., Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D [C], European Conference on Computer Vision, pp. 194-210, (2020)

[9]

PAN B W, SUN J K, LEUNG H Y T, Et al., Cross-view semantic segmentation for sensing surroundings [J], IEEE Robotics and Automation Letters, 5, 3, pp. 4867-4873, (2020)

[10]

LU CH Y, VAN DE MOLENGRAFT M J G, DUBBELMAN G., Monocular semantic occupancy grid mapping with convolutional variational encoder-decoder networks, IEEE Robotics and Automation Letters, 4, 2, pp. 445-452, (2019)

← 1 2 3 →