SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

被引：10

作者：

Doll, Simon ^{[1
,3
]}

Schulz, Richard ^{[1
]}

Schneider, Lukas ^{[1
]}

Benzin, Viviane ^{[1
]}

Enzweiler, Markus ^{[2
]}

Lensch, Hendrik P. A. ^{[3
]}

机构：

[1] Mercedes Benz, Stuttgart, Germany

[2] Esslingen Univ Appl Sci, Stuttgart, Germany

[3] Univ Tubingen, Tubingen, Germany

来源：

COMPUTER VISION, ECCV 2022, PT XXXIX | 2022年 / 13699卷

关键词：

3D object detection; Cross-sensor attention; Autonomous driving;

D O I：

10.1007/978-3-031-19842-7_14

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. After image feature extraction a decoder-only transformer architecture is trained on a set-based loss. SpatialDETR infers the classification and bounding box estimates based on attention both spatially within each image and across the different views. To fuse the multi-view information in the attention block we introduce a novel geometric positional encoding that incorporates the view ray geometry to explicitly consider the extrinsic and intrinsic camera setup. This way, the spatially-aware cross-view attention exploits arbitrary receptive fields to integrate cross-sensor data and therefore global context. Extensive experiments on the nuScenes benchmark demonstrate the potential of global attention and result in state-of-the-art performance. Code available at https://github.com/cgtuebingen/SpatialDETR.

引用

页码：230 / 245

页数：16

共 33 条

[1] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3] Contributors M., 2020, MMDetection3D: OpenMMLab next-generation platform for general 3D object detection
[4] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, 10.48550/arXiv.2010.11929]
[5] Gao P., 2021, P IEEE CVF C COMP VI, P3621
[6] github, DETR3D GITH REP
[7] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[8] Huang JJ, 2022, Arxiv, DOI arXiv:2112.11790
[9] Jaegle A, 2021, PR MACH LEARN RES, V139
[10] PointPillars: Fast Encoders for Object Detection from Point Clouds
Lang, Alex H.
Vora, Sourabh
Caesar, Holger
Zhou, Lubing
Yang, Jiong
Beijbom, Oscar
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697

← 1 2 3 4 →