Effective mmWave Radar Object Detection Pretraining Based on Masked Image Modeling

被引：3

作者：

Zhuang, Long ^{[1
]}

Jiang, Tiezhen ^{[1
]}

Wang, Jianhua ^{[1
]}

An, Qi ^{[1
]}

Xiao, Kai ^{[1
]}

Wang, Anqi ^{[1
]}

机构：

[1] Anhui Univ, Sch Integrated Circuits, Hefei 230039, Peoples R China

来源：

IEEE SENSORS JOURNAL | 2024年 / 24卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Radar; Radar imaging; Millimeter wave communication; Task analysis; Feature extraction; Radar detection; Object detection; Environmental perception; masked image modeling (MIM); millimeter-wave (mmWave) radar; radar object detection (ROD);

D O I：

10.1109/JSEN.2023.3339651

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the advancement of environmental perception technology, millimeter-wave (mmWave) radar is emerging as a predominant sensor. While deep learning has facilitated the development of mmWave radar object detection (ROD) techniques, mmWave ROD suffers from datasets because the annotation of mmWave datasets is inherently more complex. Motivated by masked image modeling (MIM), this article proposes a novel pretraining method for ROD to address the limitations posed by datasets. This study conducts masking operations on mmWave radar images from both spatial and temporal perspectives, followed by a straightforward image reconstruction proxy task. To the best of authors' knowledge, our method represents the inaugural application of the MIM self-supervision method to ROD tasks. Additionally, we designed a lightweight self-supervised ROD network (SS-RODNet). Numerous ablation experiments have demonstrated the effectiveness of the proposed method. The pretrained SS-RODNet attains comparable results to the state-of-the-art (SOTA) on CRUW and CARRADA datasets with fewer parameters and floating-point operations per second (FLOPs).

引用

页码：3999 / 4010

页数：12

共 62 条

[1] Application of Deep Learning on Millimeter-Wave Radar Signals: A Review
Abdu, Fahad Jibrin
Zhang, Yixiong
Fu, Maozhong
Li, Yuhan
Deng, Zhenmiao
[J]. SENSORS, 2021, 21 (06) : 1 - 46
[2] Akita T, 2019, IEEE INT C INTELL TR, P1110, DOI 10.1109/ITSC.2019.8917144
[3] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Chen, Liang-Chieh
Zhu, Yukun
Papandreou, George
Schroff, Florian
Adam, Hartwig
[J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 833 - 851
[4] Chen T, 2020, PR MACH LEARN RES, V119
[5] Masked Image Modeling Advances 3D Medical Image Analysis
Chen, Zekai
Agarwal, Devansh
Aggarwal, Kshitij
Safta, Wiem
Balan, Mariann Micsinai
Brown, Kevin
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1969 - 1979
[6] Daelemans W., 2014, Learning phrase representations using RNN encoder-decoder for statistical machine translation, P1724, DOI DOI 10.3115/V1/D14-1179
[7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8] Dosovitskiy A., 2021, arXiv
[9] Feichtenhofer C, 2022, Arxiv, DOI [arXiv:2205.09113, 10.48550/arXiv.2205.09113, DOI 10.48550/ARXIV.2205.09113]
[10] RAMP-CNN: A Novel Neural Network for Enhanced Automotive Radar Object Recognition
Gao, Xiangyu
Xing, Guanbin
Roy, Sumit
Liu, Hui
[J]. IEEE SENSORS JOURNAL, 2021, 21 (04) : 5119 - 5132

← 1 2 3 4 5 6 7 →