Cross-view Transformers for real-time Map-view Semantic Segmentation

被引：189

作者：

Zhou, Brady ^{[1
]}

Krahenbuhl, Philipp ^{[1
]}

机构：

[1] UT Austin, Austin, TX 78712 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR52688.2022.01339

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present cross-view transformers, an efficientattention-based model for map-view semantic segmentation from multiple cameras. Our architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism. Each camera uses positional embeddings that depend on its intrinsic and extrinsic calibration. These embeddings allow a transformer to learn the mapping across different views without ever explicitly modeling it geometrically. The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation. Our model is simple, easily parallelizable, and runs in realtime. The presented architecture performs at state-of-theart on the nuScenes dataset, with 4x faster inference speeds. Code is available at https://github.com/bradyz/ cross_view_transformers.

引用

页码：13750 / 13759

页数：10

共 48 条

[1]

Abbas Syed Ammar, 2019, ICCV WORKSH

[2]

Agarwal Sameer, 2011, COMMUNICATIONS ACM

[3]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00466

[4]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298

[5]

[Anonymous], 2020, WACV

[6]

[Anonymous], 1981, Nature

[7]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00217

[8]

[Anonymous], 2019, ICML

[9] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[10] Argoverse: 3D Tracking and Forecasting with Rich Maps [J].

Chang, Ming-Fang ;

Lambert, John ;

Sangkloy, Patsorn ;

Singh, Jagjeet ;

Bak, Slawomir ;

Hartnett, Andrew ;

Wang, De ;

Carr, Peter ;

Lucey, Simon ;

Ramanan, Deva ;

Hays, James .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8740-8749

← 1 2 3 4 5 →