BAEFormer: Bi-directional and Early Interaction Transformers for Bird's Eye View Semantic Segmentation

被引:6
作者
Pan, Cong [1 ,2 ,3 ]
He, Yonghao [3 ]
Peng, Junran [4 ]
Zhang, Qian [3 ]
Sui, Wei [3 ]
Zhang, Zhaoxiang [1 ,2 ,5 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Future Technol, Beijing, Peoples R China
[3] Horizon Robot, Beijing, Peoples R China
[4] Huawei Inc, Shenzhen, Guangdong, Peoples R China
[5] HKISI CAS, Ctr Artificial Intelligence & Robot, Beijing, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.00925
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bird's Eye View (BEV) semantic segmentation is a critical task in autonomous driving. However, existing Transformer-based methods confront difficulties in transforming Perspective View (PV) to BEV due to their unidirectional and posterior interaction mechanisms. To address this issue, we propose a novel Bi-directional and Early Interaction Transformers framework named BAEFormer, consisting of (i) an early-interaction PV-BEV pipeline and (ii) a bi-directional cross-attention mechanism. Moreover, we find that the image feature maps' resolution in the cross-attention module has a limited effect on the final performance. Under this critical observation, we propose to enlarge the size of input images and downsample the multiview image features for cross-interaction, further improving the accuracy while keeping the amount of computation controllable. Our proposed method for BEV semantic segmentation achieves state-of-the-art performance in real-time inference speed on the nuScenes dataset, i.e., 38.9 mIoU at 45 FPS on a single A100 GPU.
引用
收藏
页码:9590 / 9599
页数:10
相关论文
共 44 条
[1]  
Abbas Syed Ammar, 2019, ICCV WORKSH
[2]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298
[3]  
[Anonymous], 2021, CVPR, DOI DOI 10.1109/CVPR46437.2021.01528
[4]  
[Anonymous], IEEE INT CONF ROBOT
[5]  
Bartoccioni Florent, 2022, ARXIV220613294
[6]  
Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[7]  
Can Y. B., 2021, P IEEE CVF INT C COM, P15661
[8]  
Carion N., 2020, P EUR C COMP VIS GLA, P213, DOI DOI 10.1007/978-3-030-58452-813
[9]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[10]  
Chitta K., 2021, P IEEE CVF INT C COM, p15 793