Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition

被引:8
|
作者
Balaji, Pranav [1 ]
Prusty, Manas Ranjan [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, India
[2] Vellore Inst Technol, Ctr Cyber Phys Syst, Chennai, India
关键词
Dynamic hand gesture recognition; Multimodal fusion; Cross; -attention; Transformer; SHREC '17 track dataset;
D O I
10.1016/j.jvcir.2023.104019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent improvements in dynamic hand gesture recognition have seen a shift from traditional convolutional architectures to attention-based networks. These attention networks have been proven to outclass CNN + LSTM architectures, showing higher accuracy as well as reduced model parameters. Especially, skeleton-based attention networks have been shown to outperform visual-based networks due to the rich information from skeletonbased hand features. However, there is an opportunity to introduce complementary features from other modalities like RGB, depth, and optical flow images to enhance the recognition capability of skeleton-based networks. This paper aims to explore the addition of a multimodal fusion network to a skeleton-based Hierarchical Self-Attention Network (MF-HAN) and test for increased model effectiveness. Unlike traditional fusion techniques, this fusion network uses features derived from other sources of multimodal data in a reduced feature space using a cross-attention layer. The model outperforms its root model and other state-of-the-art models on the SHREC'17 track dataset, especially in the 28 gestures setting by more than 1 % in gesture classification accuracy. The experimentation was tested on the DHG dataset as well.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] ESAformer: Enhanced Self-Attention for Automatic Speech Recognition
    Li, Junhua
    Duan, Zhikui
    Li, Shiren
    Yu, Xinmei
    Yang, Guangguang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 471 - 475
  • [42] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
    Li, Kunchang
    Wang, Yali
    Zhang, Junhao
    Gao, Peng
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    Qiao, Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
  • [43] Hierarchical multimodal self-attention-based graph neural network for DTI prediction
    Bian, Jilong
    Lu, Hao
    Dong, Guanghui
    Wang, Guohua
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
  • [44] Attention fusion network for multimodal sentiment analysis
    Luo, Yuanyi
    Wu, Rui
    Liu, Jiafeng
    Tang, Xianglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8207 - 8217
  • [45] MM-DFN: MULTIMODAL DYNAMIC FUSION NETWORK FOR EMOTION RECOGNITION IN CONVERSATIONS
    Hu, Dou
    Hou, Xiaolong
    Wei, Lingwei
    Jiang, Lianxin
    Mo, Yang
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7037 - 7041
  • [46] CGMV-EGR: A multimodal fusion framework for electromyographic gesture recognition
    Wang, Weihao
    Liu, Yan
    Song, Fanghao
    Lu, Jingyu
    Qu, Jianing
    Guo, Junqing
    Huang, Jinming
    PATTERN RECOGNITION, 2025, 162
  • [47] Spatio-Temporal Dynamic Attention Graph Convolutional Network Based on Skeleton Gesture Recognition
    Han, Xiaowei
    Cui, Ying
    Chen, Xingyu
    Lu, Yunjing
    Hu, Wen
    ELECTRONICS, 2024, 13 (18)
  • [48] ConViViT - A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
    Dokkar, Rachid Reda
    Chaieb, Faten
    Drira, Hassen
    Aberkane, Arezki
    2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
  • [49] Bi-attention Modal Separation Network for Multimodal Video Fusion
    Du, Pengfei
    Gao, Yali
    Li, Xiaoyong
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 585 - 598
  • [50] Improvement of Dynamic Hand Gesture Recognition Based on HMM Algorithm
    Zhang, Xu-Hui
    Wang, Jun-Jie
    Wang, Xu
    Ma, Xian-Li
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 401 - 406