BAEFormer: Bi-directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

被引：6

作者：

Pan, Cong ^{[1
,2
,3
]}

He, Yonghao ^{[3
]}

Peng, Junran ^{[4
]}

Zhang, Qian ^{[3
]}

Sui, Wei ^{[3
]}

Zhang, Zhaoxiang ^{[1
,2
,5
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Future Technol, Beijing, Peoples R China

[3] Horizon Robot, Beijing, Peoples R China

[4] Huawei Inc, Shenzhen, Guangdong, Peoples R China

[5] HKISI CAS, Ctr Artificial Intelligence & Robot, Beijing, Peoples R China

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR52729.2023.00925

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bird's Eye View (BEV) semantic segmentation is a critical task in autonomous driving. However, existing Transformer-based methods confront difficulties in transforming Perspective View (PV) to BEV due to their unidirectional and posterior interaction mechanisms. To address this issue, we propose a novel Bi-directional and Early Interaction Transformers framework named BAEFormer, consisting of (i) an early-interaction PV-BEV pipeline and (ii) a bi-directional cross-attention mechanism. Moreover, we find that the image feature maps' resolution in the cross-attention module has a limited effect on the final performance. Under this critical observation, we propose to enlarge the size of input images and downsample the multiview image features for cross-interaction, further improving the accuracy while keeping the amount of computation controllable. Our proposed method for BEV semantic segmentation achieves state-of-the-art performance in real-time inference speed on the nuScenes dataset, i.e., 38.9 mIoU at 45 FPS on a single A100 GPU.

引用

页码：9590 / 9599

页数：10

共 44 条

[1]

Abbas Syed Ammar, 2019, ICCV WORKSH

[2]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298

[3]

[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01528

[4]

[Anonymous], IEEE INT CONF ROBOT

[5]

Bartoccioni Florent, 2022, ARXIV220613294

[6]

Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164

[7]

Can Y. B., 2021, P IEEE CVF INT C COM, P15661

[8]

Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813

[9] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[10]

Chitta K., 2021, P IEEE CVF INT C COM, p15 793

← 1 2 3 4 5 →