Effective mmWave Radar Object Detection Pretraining Based on Masked Image Modeling

被引:3
作者
Zhuang, Long [1 ]
Jiang, Tiezhen [1 ]
Wang, Jianhua [1 ]
An, Qi [1 ]
Xiao, Kai [1 ]
Wang, Anqi [1 ]
机构
[1] Anhui Univ, Sch Integrated Circuits, Hefei 230039, Peoples R China
基金
中国国家自然科学基金;
关键词
Radar; Radar imaging; Millimeter wave communication; Task analysis; Feature extraction; Radar detection; Object detection; Environmental perception; masked image modeling (MIM); millimeter-wave (mmWave) radar; radar object detection (ROD);
D O I
10.1109/JSEN.2023.3339651
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the advancement of environmental perception technology, millimeter-wave (mmWave) radar is emerging as a predominant sensor. While deep learning has facilitated the development of mmWave radar object detection (ROD) techniques, mmWave ROD suffers from datasets because the annotation of mmWave datasets is inherently more complex. Motivated by masked image modeling (MIM), this article proposes a novel pretraining method for ROD to address the limitations posed by datasets. This study conducts masking operations on mmWave radar images from both spatial and temporal perspectives, followed by a straightforward image reconstruction proxy task. To the best of authors' knowledge, our method represents the inaugural application of the MIM self-supervision method to ROD tasks. Additionally, we designed a lightweight self-supervised ROD network (SS-RODNet). Numerous ablation experiments have demonstrated the effectiveness of the proposed method. The pretrained SS-RODNet attains comparable results to the state-of-the-art (SOTA) on CRUW and CARRADA datasets with fewer parameters and floating-point operations per second (FLOPs).
引用
收藏
页码:3999 / 4010
页数:12
相关论文
共 62 条
  • [1] Application of Deep Learning on Millimeter-Wave Radar Signals: A Review
    Abdu, Fahad Jibrin
    Zhang, Yixiong
    Fu, Maozhong
    Li, Yuhan
    Deng, Zhenmiao
    [J]. SENSORS, 2021, 21 (06) : 1 - 46
  • [2] Akita T, 2019, IEEE INT C INTELL TR, P1110, DOI 10.1109/ITSC.2019.8917144
  • [3] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
    Chen, Liang-Chieh
    Zhu, Yukun
    Papandreou, George
    Schroff, Florian
    Adam, Hartwig
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
  • [4] Chen T, 2020, PR MACH LEARN RES, V119
  • [5] Masked Image Modeling Advances 3D Medical Image Analysis
    Chen, Zekai
    Agarwal, Devansh
    Aggarwal, Kshitij
    Safta, Wiem
    Balan, Mariann Micsinai
    Brown, Kevin
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1969 - 1979
  • [6] Daelemans W., 2014, Learning phrase representations using RNN encoder-decoder for statistical machine translation, P1724, DOI DOI 10.3115/V1/D14-1179
  • [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [8] Dosovitskiy A., 2021, arXiv
  • [9] Feichtenhofer C, 2022, Arxiv, DOI [arXiv:2205.09113, 10.48550/arXiv.2205.09113, DOI 10.48550/ARXIV.2205.09113]
  • [10] RAMP-CNN: A Novel Neural Network for Enhanced Automotive Radar Object Recognition
    Gao, Xiangyu
    Xing, Guanbin
    Roy, Sumit
    Liu, Hui
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (04) : 5119 - 5132