CR-DINO: A Novel Camera-Radar Fusion 2-D Object Detection Model Based on Transformer

被引：6

作者：

Jin, Yuhao ^{[1
]}

Zhu, Xiaohui ^{[1
]}

Yue, Yong ^{[1
]}

Lim, Eng Gee ^{[1
]}

Wang, Wei ^{[2
]}

机构：

[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215000, Peoples R China

[2] Hebei Normal Univ, Coll Comp & Cyber Secur, Shijiazhuang 050024, Hebei, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 07期

关键词：

Autonomous vehicle; deep learning; multisensor fusion; object detection; transformer;

D O I：

10.1109/JSEN.2024.3357775

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to millimeter-wave (MMW) radar's ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the complementary strengths of camera and radar by feature-level fusion and propose a fully transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin transformer. We name our proposed model as camera-radar based DETR with improved denoising anchor boxes (CR-DINO) and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art (SOTA) camera-radar fusion object detection models.

引用

页码：11080 / 11090

页数：11

共 37 条

[1]

Bansal K., 2022, arXiv, DOI 10.48550/ARXIV.2208.03849

[2]

Bao HB, 2020, PR MACH LEARN RES, V119

[3]

Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934

[4] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[7]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[8] Fast R-CNN [J].

Girshick, Ross .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448

[9] Rich feature hierarchies for accurate object detection and semantic segmentation [J].

Girshick, Ross ;

Donahue, Jeff ;

Darrell, Trevor ;

Malik, Jitendra .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587

[10]

Gu Y., 2022, PROC IEEE INT C SIG, P1

← 1 2 3 4 →