SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection From Multi-view Camera Images With Global Cross-Sensor Attention

被引:10
|
作者
Doll, Simon [1 ,3 ]
Schulz, Richard [1 ]
Schneider, Lukas [1 ]
Benzin, Viviane [1 ]
Enzweiler, Markus [2 ]
Lensch, Hendrik P. A. [3 ]
机构
[1] Mercedes Benz, Stuttgart, Germany
[2] Esslingen Univ Appl Sci, Stuttgart, Germany
[3] Univ Tubingen, Tubingen, Germany
来源
关键词
3D object detection; Cross-sensor attention; Autonomous driving;
D O I
10.1007/978-3-031-19842-7_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Based on the key idea of DETR this paper introduces an object-centric 3D object detection framework that operates on a limited number of 3D object queries instead of dense bounding box proposals followed by non-maximum suppression. After image feature extraction a decoder-only transformer architecture is trained on a set-based loss. SpatialDETR infers the classification and bounding box estimates based on attention both spatially within each image and across the different views. To fuse the multi-view information in the attention block we introduce a novel geometric positional encoding that incorporates the view ray geometry to explicitly consider the extrinsic and intrinsic camera setup. This way, the spatially-aware cross-view attention exploits arbitrary receptive fields to integrate cross-sensor data and therefore global context. Extensive experiments on the nuScenes benchmark demonstrate the potential of global attention and result in state-of-the-art performance. Code available at https://github.com/cgtuebingen/SpatialDETR.
引用
收藏
页码:230 / 245
页数:16
相关论文
共 50 条
  • [31] Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
    Ye, Mao
    Meyer, Gregory P.
    Chai, Yuning
    Liu, Qiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8404 - 8416
  • [32] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
    Liu, Hao
    Ma, Yanni
    Wang, Hanyun
    Zhang, Chaobo
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000
  • [33] Hierarchical Graph Attention Based Multi-View Convolutional Neural Network for 3D Object Recognition
    Zeng, Hui
    Zhao, Tianmeng
    Cheng, Ruting
    Wang, Fuzhou
    Liu, Jiwei
    IEEE ACCESS, 2021, 9 (09): : 33323 - 33335
  • [34] STXD: Structural and Temporal Cross-Modal Distillation for Multi-View 3D Object Detection
    Jang, Sujin
    Jo, Dae Ung
    Hwang, Sung Ju
    Lee, Dongwook
    Ji, Daehyun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
    Yan, Junjie
    Liu, Yingfei
    Sun, Jianjian
    Jia, Fan
    Li, Shuailin
    Wang, Tiancai
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
  • [36] Multi-view semantic learning network for point cloud based 3D object detection
    Yang, Yongguang
    Chen, Feng
    Wu, Fei
    Zeng, Deliang
    Ji, Yi-mu
    Jing, Xiao-Yuan
    NEUROCOMPUTING, 2020, 397 (397) : 477 - 485
  • [37] 3D object detection based on DST fusion multi-view fuzzy reasoning assignment
    Zhang C.-F.
    Li C.-W.-L.
    Zou Y.-Q.
    Jin N.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (04): : 867 - 875
  • [38] 3D Object Localisation from Multi-View Image Detections
    Rubino, Cosimo
    Crocco, Marco
    Del Bue, Alessio
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1281 - 1294
  • [39] AeDet: Azimuth-invariant Multi-view 3D Object Detection
    Feng, Chengjian
    Jie, Zequn
    Zhong, Yujie
    Chu, Xiangxiang
    Ma, Lin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21580 - 21588
  • [40] BEVDepth: Acquisition of Reliable Depth for Multi-View 3D Object Detection
    Li, Yinhao
    Ge, Zheng
    Yu, Guanyi
    Yang, Jinrong
    Wang, Zengran
    Shi, Yukang
    Sun, Jianjian
    Li, Zeming
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1477 - 1485