Real-Time Dynamic Gesture Recognition Algorithm Based on Adaptive Information Fusion and Multi-Scale Optimization Transformer

被引:1
作者
Lu, Guangda [1 ,2 ]
Sun, Wenhao [1 ,2 ]
Qin, Zhuanping [1 ,2 ]
Guo, Tinghang [1 ,2 ]
机构
[1] Tianjin Univ Technol & Educ, Sch Automat & Elect Engn, 1310 Dagu South Rd, Tianjin 300222, Peoples R China
[2] Tianjin Key Lab Informat Sensing & Intelligent Co, 1310 DaGu South Rd, Tianjin 300222, Peoples R China
关键词
dynamic gesture recognition; Transformer; optical flow; information fusion;
D O I
10.20965/jaciii.2023.p1096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gesture recognition is a popular technology in the field of computer vision and an important technical mean of achieving human-computer interaction. To address problems such as the limited long-range feature extraction capability of existing dynamic gesture recognition networks based on convolutional operators, we propose a dynamic gesture recognition algorithm based on spatial pyramid pooling Transformer and optical flow information fusion. We take advantage of Transformer's large receptive field to reduce model computation while improving the model's ability to extract features at different scales by embedding spatial pyramid pooling. We use the optical flow algorithm with the global motion aggregation module to obtain an optical flow map of hand motion, and to extract the key frames based on the similarity minimization principle. We also design an adaptive feature fusion method to fuse the spatial and temporal features of the dual channels. Finally, we demonstrate the effectiveness of model components on model recognition enhancement through ablation experiments. We conduct training and validation on the SCUT-DHGA dynamic gesture dataset and on a dataset we collected, and we perform real-time dynamic gesture recognition tests using the trained model. The results show that our algorithm achieves high accuracy even while keeping the parameters balanced. It also achieves fast and accurate recognition of dynamic gestures in real-time tests.
引用
收藏
页码:1096 / 1107
页数:12
相关论文
共 32 条
  • [1] Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training
    Abavisani, Mahdi
    Joze, Hamid Reza Vaezi
    Patel, Vishal M.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1165 - 1174
  • [2] The computation of optical flow
    Beauchemin, SS
    Barron, JL
    [J]. ACM COMPUTING SURVEYS, 1995, 27 (03) : 433 - 467
  • [3] Improving Real-Time Hand Gesture Recognition with Semantic Segmentation
    Benitez-Garcia, Gibran
    Prudente-Tixteco, Lidia
    Castro-Madrid, Luis Carlos
    Toscano-Medina, Rocio
    Olivares-Mercado, Jesus
    Sanchez-Perez, Gabriel
    Villalba, Luis Javier Garcia
    [J]. SENSORS, 2021, 21 (02) : 1 - 16
  • [4] Gesture-Based Human-Machine Interaction: Taxonomy, Problem Definition, and Analysis
    Carfi, Alessandro
    Mastrogiovanni, Fulvio
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (01) : 497 - 513
  • [5] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, 10.48550/arXiv.1412.3555]
  • [6] Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
  • [7] Attention in Natural Language Processing
    Galassi, Andrea
    Lippi, Marco
    Torroni, Paolo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) : 4291 - 4308
  • [8] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [9] Learning to Estimate Hidden Motions with Global Motion Aggregation
    Jiang, Shihao
    Campbell, Dylan
    Lu, Yao
    Li, Hongdong
    Hartley, Richard
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9752 - 9761
  • [10] Spatio-temporal continuous gesture recognition under degraded environments: performance comparison between 3D integral imaging (InIm) and RGB-D sensors
    Krishnan, Gokul
    Huang, Yinuo
    Joshi, Rakesh
    O'Connor, Timothy
    Javidi, Bahram
    [J]. OPTICS EXPRESS, 2021, 29 (19) : 30937 - 30951