FMCW Radar-Based Hand Gesture Recognition Using Spatiotemporal Deformable and Context-Aware Convolutional 5-D Feature Representation

被引：29

作者：

Dong, Xichao ^{[1
,2
,3
]}

Zhao, Zewei ^{[1
,2
]}

Wang, Yupei ^{[1
,2
]}

Zeng, Tao ^{[1
,2
]}

Wang, Jianping ^{[4
]}

Sui, Yi ^{[1
,2
]}

机构：

[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Minist Educ, Key Lab Elect & Informat Technol Satellite Nav, Beijing 100081, Peoples R China

[3] Beijing Inst Technol, Chongqing Innovat Ctr, Chongqing Key Lab Novel Civilian Radar, Chongqing 401120, Peoples R China

[4] Delft Univ Technol, Fac Elect Engn Math & Comp Sci EEMCS, NL-2628 CD Delft, Netherlands

来源：

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING | 2022年 / 60卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

Azimuth; Feature extraction; Spatiotemporal phenomena; Convolution; Three-dimensional displays; Estimation; Doppler effect; Frequency-modulated continuous-wave (FMCW) radar; hand gesture recognition (HGR); spatiotemporal context modeling; spatiotemporal deformable convolution (STDC); DOPPLER-RADAR;

D O I：

10.1109/TGRS.2021.3122332

中图分类号：

P3 [地球物理学]; P59 [地球化学];

学科分类号：

0708 ; 070902 ;

摘要：

Recently, frequency-modulated continuous-wave (FMCW) radar-based hand gesture recognition (HGR) using deep learning has achieved favorable performance. However, many existing methods use extracted features separately, i.e., using one of the range, Doppler, azimuth, or elevation angle information, or a combination of any two, to train convolutional neural networks (CNNs), which ignore the interrelation among the 5-D time-varying-range-Doppler-azimuth-elevation feature space. Although there have been methods using the 5-D information, their mining of the interrelation among the 5-D feature space is not sufficient, and there is still room for improvements. This article proposes a new processing scheme of HGR based on 5-D feature cubes that are jointly encoded by a 3-D fast Fourier transform (3-D-FFT)-based method. Then, a CNN is proposed by building two novel blocks, i.e., the spatiotemporal deformable convolution (STDC) block and the adaptive spatiotemporal context-aware convolution (ASTCAC) block. Concretely, STDC is designed to cope with hand gestures' large spatiotemporal geometric transformations in the 5-D feature space. Moreover, ASTCAC is designed for modeling long-distance global relationships, e.g., relationships between pixels of the feature at the upper left corner and lower right corner, and exploring the global spatiotemporal context, in order to enhance the target feature representation and suppress interference. Finally, our presented method is verified on a large radar dataset, including 19 760 sets of 16 common hand gestures, collected by 19 subjects. Our method obtains a recognition rate of 99.53% on the validation dataset and that of 97.22% on the test dataset, which is significantly better than state-of-the-art methods.

引用

页数：11

共 52 条

[1] GestureVLAD: Combining Unsupervised Features Representation and Spatio-Temporal Aggregation for Doppler-Radar Gesture Recognition [J].

Berenguer, Abel Diaz ;

Oveneke, Meshia Cedric ;

Khalid, Habib-Ur-Rehman ;

Alioscha-Perez, Mitchel ;

Bourdoux, Andre ;

Sahli, Hichem .

IEEE ACCESS, 2019, 7 :137122-137135

[2] Object Detection in Video with Spatiotemporal Sampling Networks [J].

Bertasius, Gedas ;

Torresani, Lorenzo ;

Shi, Jianbo .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :342-357

[3] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[4] A review of hand gesture and sign language recognition techniques [J].

Cheok, Ming Jin ;

Omar, Zaid ;

Jaward, Mohamed Hisham .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) :131-153

[5] Deformable Convolutional Networks [J].

Dai, Jifeng ;

Qi, Haozhi ;

Xiong, Yuwen ;

Li, Yi ;

Zhang, Guodong ;

Hu, Han ;

Wei, Yichen .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :764-773

[6]

Dekker B, 2017, EUROP RADAR CONF, P163, DOI 10.23919/EURAD.2017.8249172

[7]

Deng-Yuan Huang, 2009, Proceedings of the 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2009, P1, DOI 10.1109/IIH-MSP.2009.96

[8] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[9] Radar-Based Human-Motion Recognition With Deep Learning Promising applications for indoor monitoring [J].

Gurbuz, Sevgi Zubeyde ;

Amin, Moeness G. .

IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (04) :16-28

[10] Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? [J].

Hara, Kensho ;

Kataoka, Hirokatsu ;

Satoh, Yutaka .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6546-6555

← 1 2 3 4 5 6 →