CR-DINO: A Novel Camera-Radar Fusion 2-D Object Detection Model Based on Transformer

被引:3
作者
Jin, Yuhao [1 ]
Zhu, Xiaohui [1 ]
Yue, Yong [1 ]
Lim, Eng Gee [1 ]
Wang, Wei [2 ]
机构
[1] Xian Jiaotong Liverpool Univ, Sch Adv Technol, Suzhou 215000, Peoples R China
[2] Hebei Normal Univ, Coll Comp & Cyber Secur, Shijiazhuang 050024, Hebei, Peoples R China
关键词
Autonomous vehicle; deep learning; multisensor fusion; object detection; transformer;
D O I
10.1109/JSEN.2024.3357775
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to millimeter-wave (MMW) radar's ability to directly acquire spatial positions and velocity information of objects, as well as its robust performance in adverse weather conditions, it has been widely employed in autonomous driving. However, radar lacks specific semantic information. To address this limitation, we take the complementary strengths of camera and radar by feature-level fusion and propose a fully transformer-based model for object detection in autonomous driving. Specifically, we introduce a novel radar representation method and propose two camera-radar fusion architectures based on Swin transformer. We name our proposed model as camera-radar based DETR with improved denoising anchor boxes (CR-DINO) and conduct training and testing on the nuScenes dataset. We conducted several ablation experiments, and the best result we obtained was an mAP of 38.0%, surpassing other state-of-the-art (SOTA) camera-radar fusion object detection models.
引用
收藏
页码:11080 / 11090
页数:11
相关论文
共 37 条
  • [1] Bansal Kshitiz, 2022, arXiv
  • [2] Bao HB, 2020, PR MACH LEARN RES, V119
  • [3] Bochkovskiy A, 2020, Arxiv, DOI arXiv:2004.10934
  • [4] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [7] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [8] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [9] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587
  • [10] Gu Y., 2022, PROC IEEE INT C SIG, P1