Cross-view Transformers for real-time Map-view Semantic Segmentation

被引:189
作者
Zhou, Brady [1 ]
Krahenbuhl, Philipp [1 ]
机构
[1] UT Austin, Austin, TX 78712 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.01339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present cross-view transformers, an efficientattention-based model for map-view semantic segmentation from multiple cameras. Our architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism. Each camera uses positional embeddings that depend on its intrinsic and extrinsic calibration. These embeddings allow a transformer to learn the mapping across different views without ever explicitly modeling it geometrically. The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation. Our model is simple, easily parallelizable, and runs in realtime. The presented architecture performs at state-of-theart on the nuScenes dataset, with 4x faster inference speeds. Code is available at https://github.com/bradyz/ cross_view_transformers.
引用
收藏
页码:13750 / 13759
页数:10
相关论文
共 48 条
[11]  
Cordts Marius, 2015, CVPR WORKSH FUT DAT, V2, P2
[12]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[13]  
Eigen D, 2014, ADV NEUR IN, V27
[14]  
Frahm JM, 2010, LECT NOTES COMPUT SC, V6314, P368, DOI 10.1007/978-3-642-15561-1_27
[15]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011
[16]  
Garnett Noa, 2019, ICCV
[17]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[18]  
Godard C., 2019, CVPR
[19]   Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Brostow, Gabriel J. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611
[20]  
Houston J., 2021, CORL