mmPose-NLP: A Natural Language Processing Approach to Precise Skeletal Pose Estimation Using mmWave Radars

被引：51

作者：

Sengupta, Arindam ^{[1
]}

Cao, Siyang ^{[1
]}

机构：

[1] Univ Arizona, Dept Elect & Comp Engn, Tucson, AZ 85721 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 11期

关键词：

Radar; Pose estimation; Optical sensors; Estimation; Antenna arrays; Lighting; Doppler radar; Gated recurrent unit (GRU); millimeter-wave (mmWave) radars; natural language processing (NLP); point cloud (PCL); pose estimation; sequence-to-sequence (Seq2Seq); skeletal key points; skeletal pose; GAIT ANALYSIS; KINECT;

D O I：

10.1109/TNNLS.2022.3151101

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we presented mmPose-NLP, a novel natural language processing (NLP) inspired sequence-to-sequence (Seq2Seq) skeletal key-point estimator using millimeter-wave (mmWave) radar data. To the best of our knowledge, this is the first method to precisely estimate up to 25 skeletal key points using mmWave radar data alone. Skeletal pose estimation is critical in several applications ranging from autonomous vehicles, traffic monitoring, patient monitoring, and gait analysis, to defense security forensics, and aid both preventative and actionable decision making. The use of mmWave radars for this task, over traditionally employed optical sensors, provides several advantages, primarily its operational robustness to scene lighting and adverse weather conditions, where optical sensor performance degrade significantly. The mmWave radar point-cloud (PCL) data are first voxelized (analogous to tokenization in NLP) and N frames of the voxelized radar data (analogous to a text paragraph in NLP) is subjected to the proposed mmPose-NLP architecture, where the voxel indices of the 25 skeletal key points (analogous to keyword extraction in NLP) are predicted. The voxel indices are converted back to real-world 3-D coordinates using the voxel dictionary used during the tokenization process. Mean absolute error (MAE) metrics were used to measure the accuracy of the proposed system against the ground truth, with the proposed mmPose-NLP offering <3 cm localization errors in the depth, horizontal, and vertical axes. The effect of the number of input frames versus performance/accuracy was also studied for N = {1,2,...,10}. A comprehensive methodology, results, discussions, and limitations are presented in this article. All the source codes and results are made available on GitHub for further research and development in this critical yet emerging domain of skeletal key-point estimation using mmWave radars.

引用

页码：8418 / 8429

页数：12

共 39 条

[1] Capturing the Human Figure Through a Wall
Adib, Fadel
Hsu, Chen-Yu
Mao, Hongzi
Katabi, Dina
Durand, Fredo
[J]. ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
[2] [Anonymous], 2018, NEW YORK TIMES 0319
[3] [Anonymous], 2016, TRAG LOSS
[4] [Anonymous], 2015, 2015 11 IEEE INT C W
[5] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[6] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[7] Ester M., 1996, P 2 INT C KNOWL DISC, P226, DOI DOI 10.5555/3001460.3001507
[8] Using k-poselets for detecting people and localizing their keypoints
Gkioxari, Georgia
Hariharan, Bharath
Girshick, Ross
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : CP32 - CP32
[9] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[10] He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]

← 1 2 3 4 →