Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition

被引:8
|
作者
Balaji, Pranav [1 ]
Prusty, Manas Ranjan [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, India
[2] Vellore Inst Technol, Ctr Cyber Phys Syst, Chennai, India
关键词
Dynamic hand gesture recognition; Multimodal fusion; Cross; -attention; Transformer; SHREC '17 track dataset;
D O I
10.1016/j.jvcir.2023.104019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent improvements in dynamic hand gesture recognition have seen a shift from traditional convolutional architectures to attention-based networks. These attention networks have been proven to outclass CNN + LSTM architectures, showing higher accuracy as well as reduced model parameters. Especially, skeleton-based attention networks have been shown to outperform visual-based networks due to the rich information from skeletonbased hand features. However, there is an opportunity to introduce complementary features from other modalities like RGB, depth, and optical flow images to enhance the recognition capability of skeleton-based networks. This paper aims to explore the addition of a multimodal fusion network to a skeleton-based Hierarchical Self-Attention Network (MF-HAN) and test for increased model effectiveness. Unlike traditional fusion techniques, this fusion network uses features derived from other sources of multimodal data in a reduced feature space using a cross-attention layer. The model outperforms its root model and other state-of-the-art models on the SHREC'17 track dataset, especially in the 28 gestures setting by more than 1 % in gesture classification accuracy. The experimentation was tested on the DHG dataset as well.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Online Dynamic Hand Gesture Recognition with Multiple Cues
    Zhao, Ying
    Yan, Jiayong
    2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 219 - 223
  • [32] Variational Self-attention Network for Sequential Recommendation
    Zhao, Jing
    Zhao, Pengpeng
    Zhao, Lei
    Liu, Yanchi
    Sheng, Victor S.
    Zhou, Xiaofang
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1559 - 1570
  • [33] A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
    Liu, Wei
    Sun, Jiaming
    Sun, Yiming
    Chen, Chunyi
    ELECTRONICS, 2022, 11 (10)
  • [34] MAFN: multi-level attention fusion network for multimodal named entity recognition
    Xiaoying Zhou
    Yijia Zhang
    Zhuang Wang
    Mingyu Lu
    Xiaoxia Liu
    Multimedia Tools and Applications, 2024, 83 : 45047 - 45058
  • [35] Dynamic hand gesture recognition using hierarchical dynamic Bayesian networks through low-level image processing
    Wang, Wei-Hua Andrew
    Tung, Chun-Liang
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3247 - +
  • [36] MAFN: multi-level attention fusion network for multimodal named entity recognition
    Zhou, Xiaoying
    Zhang, Yijia
    Wang, Zhuang
    Lu, Mingyu
    Liu, Xiaoxia
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45047 - 45058
  • [37] Attention fusion network for multimodal sentiment analysis
    Yuanyi Luo
    Rui Wu
    Jiafeng Liu
    Xianglong Tang
    Multimedia Tools and Applications, 2024, 83 : 8207 - 8217
  • [38] BEATS: Bengali Speech Acts Recognition using Multimodal Attention Fusion
    Deb, Ahana
    Nag, Sayan
    Mahapatra, Ayan
    Chattopadhyay, Soumitri
    Marik, Aritra
    Gayen, Pijush Kanti
    Sanyal, Shankha
    Banerjee, Archi
    Karmakar, Samir
    INTERSPEECH 2023, 2023, : 3392 - 3396
  • [39] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96
  • [40] Dynamic hand gesture recognition based on SURF tracking
    Bao J.
    Song A.
    Guo Y.
    Tang H.
    Jiqiren/Robot, 2011, 33 (04): : 482 - 489