Cross-view Transformers for real-time Map-view Semantic Segmentation

被引：189

作者：

Zhou, Brady ^{[1
]}

Krahenbuhl, Philipp ^{[1
]}

机构：

[1] UT Austin, Austin, TX 78712 USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR52688.2022.01339

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present cross-view transformers, an efficientattention-based model for map-view semantic segmentation from multiple cameras. Our architecture implicitly learns a mapping from individual camera views into a canonical map-view representation using a camera-aware cross-view attention mechanism. Each camera uses positional embeddings that depend on its intrinsic and extrinsic calibration. These embeddings allow a transformer to learn the mapping across different views without ever explicitly modeling it geometrically. The architecture consists of a convolutional image encoder for each view and cross-view transformer layers to infer a map-view semantic segmentation. Our model is simple, easily parallelizable, and runs in realtime. The presented architecture performs at state-of-theart on the nuScenes dataset, with 4x faster inference speeds. Code is available at https://github.com/bradyz/ cross_view_transformers.

引用

页码：13750 / 13759

页数：10

共 48 条

[11]

Cordts Marius, 2015, CVPR WORKSH FUT DAT, V2, P2

[12] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

[13]

Eigen D, 2014, ADV NEUR IN, V27

[14]

Frahm JM, 2010, LECT NOTES COMPUT SC, V6314, P368, DOI 10.1007/978-3-642-15561-1_27

[15] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[16]

Garnett Noa, 2019, ICCV

[17]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

[18]

Godard C., 2019, CVPR

[19] Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Brostow, Gabriel J. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611

[20]

Houston J., 2021, CORL

← 1 2 3 4 5 →