Cross-view Transformers for real-time Map-view Semantic Segmentation

被引:189
作者
Zhou, Brady [1 ]
Krahenbuhl, Philipp [1 ]
机构
[1] UT Austin, Austin, TX 78712 USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.01339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present cross-view transformers, an efficientattention-based model for map-view semantic segmentation from multiple cameras. Our architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism. Each camera uses positional embeddings that depend on its intrinsic and extrinsic calibration. These embeddings allow a transformer to learn the mapping across different views without ever explicitly modeling it geometrically. The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation. Our model is simple, easily parallelizable, and runs in realtime. The presented architecture performs at state-of-theart on the nuScenes dataset, with 4x faster inference speeds. Code is available at https://github.com/bradyz/ cross_view_transformers.
引用
收藏
页码:13750 / 13759
页数:10
相关论文
共 48 条
[1]  
Abbas Syed Ammar, 2019, ICCV WORKSH
[2]  
Agarwal Sameer, 2011, COMMUNICATIONS ACM
[3]  
[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00466
[4]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298
[5]  
[Anonymous], 2020, WACV
[6]  
[Anonymous], 1981, Nature
[7]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00217
[8]  
[Anonymous], 2019, ICML
[9]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[10]   Argoverse: 3D Tracking and Forecasting with Rich Maps [J].
Chang, Ming-Fang ;
Lambert, John ;
Sangkloy, Patsorn ;
Singh, Jagjeet ;
Bak, Slawomir ;
Hartnett, Andrew ;
Wang, De ;
Carr, Peter ;
Lucey, Simon ;
Ramanan, Deva ;
Hays, James .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8740-8749