DAFDeTr: Deformable Attention Fusion Based 3D Detection Transformer

被引：2

作者：

Erabati, Gopi Krishna ^{[1
]}

Araujo, Helder ^{[1
]}

机构：

[1] Univ Coimbra, Inst Syst & Robot, P-3030290 Coimbra, Portugal

来源：

ROBOTICS, COMPUTER VISION AND INTELLIGENT SYSTEMS, ROBOVIS 2024 | 2024年 / 2077卷

关键词：

3D Object detection; Transformer; Attention; LiDAR; Fusion;

D O I：

10.1007/978-3-031-59057-3_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing approaches fuse the LiDAR points and image pixels by hard association relying on highly accurate calibration matrices. We propose Deformable Attention Fusion based 3D Detection Transformer (DAFDeTr) to attentively and adaptively fuse the image features to the LiDAR features with soft association using deformable attention mechanism. Specifically, our detection head consists of two decoders for sequential fusion: LiDAR and image decoder powered by deformable cross-attention to link the multi-modal features to the 3D object predictions leveraging a sparse set of object queries. The refined object queries from the LiDAR decoder attentively fuse with the corresponding and required image features establishing a soft association, thereby making our model robust for any camera malfunction. We conduct extensive experiments and analysis on nuScenes and Waymo datasets. Our DAFDeTr-L achieves 63.4 mAP and outperforms well established networks on the nuScenes dataset and obtains competitive performance on the Waymo dataset. Our fusion model DAFDeTr achieves 64.6 mAP on the nuScenes dataset. We also extend our model to the 3D tracking task and our model outperforms state-of-the-art methods on 3D tracking.

引用

页码：293 / 315

页数：23

共 50 条

[1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[2] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4]

Chen Q., 2021, P ADV NEUR INF PROC, V34, P26871

[5]

Chen Q., 2020, Adv. Neural Inf. Process. Syst., V33, P21224

[6] Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots [J].

Chen, Qi ;

Sun, Lin ;

Wang, Zhixin ;

Jia, Kui ;

Yuille, Alan .

COMPUTER VISION - ECCV 2020, PT XXI, 2020, 12366 :68-84

[7] Multi-View 3D Object Detection Network for Autonomous Driving [J].

Chen, Xiaozhi ;

Ma, Huimin ;

Wan, Ji ;

Li, Bo ;

Xia, Tian .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534

[8]

Chiu HK, 2020, Arxiv, DOI arXiv:2001.05673

[9]

Contributors M., 2020, MMDetection3D: OpenMMLab next-generation platform for general 3D object detection

[10] Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].

Dai, Angela ;

Qi, Charles Ruizhongtai ;

Niessner, Matthias .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554

← 1 2 3 4 5 →