Dynamic Gesture Recognition Network Based on Multiscale Spatiotemporal Feature Fusion

被引：2

作者：

Liu, Jie ^{[1
]}

Wang, Yue ^{[1
]}

Tian, Ming ^{[2
]}

机构：

[1] Harbin Univ Sci & Technol, Harbin 150080, Peoples R China

[2] China Telecom Heilongjiang Branch, Harbin 150040, Peoples R China

来源：

JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY | 2023年 / 45卷 / 07期

关键词：

Dynamic gesture recognition; Deep learning; Convolutional vision Transformer (CvT); Multiscale fusion;

D O I：

10.11999/JEIT220758

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Because of the time complexity and space complexity of dynamic gesture data, traditional machine learning algorithms are difficult to extract accurate gesture features; The existing dynamic gesture recognition algorithms have complex network design, large amount of parameters and insufficient gesture feature extraction. To solve the above problems, a multiscale spatiotemporal feature fusion network based on Convolutional vision Transformer(CvT)is proposed. Firstly, the CvT network used in the field of image classification is introduced into the field of dynamic gesture classification. The CvT network is used to extract the spatial features of a single gesture image, and fuse the shallow features and deep features of different spatial scales. Secondly, a multi time scale aggregation module is designed to extract the spatio-temporal features of dynamic gestures. The CvT network is combined with the multi time scale aggregation module to suppress invalid features. Finally, in order to make up for the deficiency of dropout layer in CvT network, r-drop model is applied to multi-scale spatiotemporal feature fusion network. The experimental results on Jester dataset show that the proposed method is superior to the existing dynamic gesture recognition methods in recognition rate, and the recognition rate on Jester dataset reaches 92.26%.

引用

页码：2614 / 2622

页数：9

共 22 条

[1] A survey on deep learning based approaches for action and gesture recognition in image sequences [J].

Asadi-Aghbolaghi, Maryam ;

Clapes, Albert ;

Bellantonio, Marco ;

Escalante, Hugo Jair ;

Ponce-Lopez, Victor ;

Baro, Xavier ;

Guyon, Isabelle ;

Kasaei, Shohreh ;

Escalera, Sergio .

2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, :476-483

[2]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[3] SlowFast Networks for Video Recognition [J].

Feichtenhofer, Christoph ;

Fan, Haoqi ;

Malik, Jitendra ;

He, Kaiming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210

[4] A Pairwise Attentive Adversarial Spatiotemporal Network for Cross-Domain Few-Shot Action Recognition-R2 [J].

Gao, Zan ;

Guo, Leming ;

Guan, Weili ;

Liu, Anan ;

Ren, Tongwei ;

Chen, Shengyong .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :767-782

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6]

[胡凯 Hu Kai], 2021, [重庆邮电大学学报. 自然科学版, Journal of Chongqing University of Posts and Telecommunications. Natural Science Edition], V33, P970

[7] 3D Convolutional Neural Networks for Human Action Recognition [J].

Ji, Shuiwang ;

Xu, Wei ;

Yang, Ming ;

Yu, Kai .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :221-231

[8]

Jie Huang, 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME), DOI 10.1109/ICME.2015.7177428

[9] Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data Is Continuous and Weakly Labelled [J].

Koller, Oscar ;

Ney, Hermann ;

Bowden, Richard .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3793-3802

[10]

LIANG Xiaobo, 2021, P THIRTYFIFTH C NEUR

← 1 2 3 →