Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition

被引：8

作者：

Balaji, Pranav ^{[1
]}

Prusty, Manas Ranjan ^{[2
]}

机构：

[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, India

[2] Vellore Inst Technol, Ctr Cyber Phys Syst, Chennai, India

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2024年 / 98卷

关键词：

Dynamic hand gesture recognition; Multimodal fusion; Cross; -attention; Transformer; SHREC '17 track dataset;

D O I：

10.1016/j.jvcir.2023.104019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent improvements in dynamic hand gesture recognition have seen a shift from traditional convolutional architectures to attention-based networks. These attention networks have been proven to outclass CNN + LSTM architectures, showing higher accuracy as well as reduced model parameters. Especially, skeleton-based attention networks have been shown to outperform visual-based networks due to the rich information from skeletonbased hand features. However, there is an opportunity to introduce complementary features from other modalities like RGB, depth, and optical flow images to enhance the recognition capability of skeleton-based networks. This paper aims to explore the addition of a multimodal fusion network to a skeleton-based Hierarchical Self-Attention Network (MF-HAN) and test for increased model effectiveness. Unlike traditional fusion techniques, this fusion network uses features derived from other sources of multimodal data in a reduced feature space using a cross-attention layer. The model outperforms its root model and other state-of-the-art models on the SHREC'17 track dataset, especially in the 28 gestures setting by more than 1 % in gesture classification accuracy. The experimentation was tested on the DHG dataset as well.

引用

页数：11

共 50 条

[41] ESAformer: Enhanced Self-Attention for Automatic Speech Recognition
Li, Junhua
Duan, Zhikui
Li, Shiren
Yu, Xinmei
Yang, Guangguang
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 471 - 475
[42] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
Li, Kunchang
Wang, Yali
Zhang, Junhao
Gao, Peng
Song, Guanglu
Liu, Yu
Li, Hongsheng
Qiao, Yu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
[43] Hierarchical multimodal self-attention-based graph neural network for DTI prediction
Bian, Jilong
Lu, Hao
Dong, Guanghui
Wang, Guohua
BRIEFINGS IN BIOINFORMATICS, 2024, 25 (04)
[44] Attention fusion network for multimodal sentiment analysis
Luo, Yuanyi
Wu, Rui
Liu, Jiafeng
Tang, Xianglong
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8207 - 8217
[45] MM-DFN: MULTIMODAL DYNAMIC FUSION NETWORK FOR EMOTION RECOGNITION IN CONVERSATIONS
Hu, Dou
Hou, Xiaolong
Wei, Lingwei
Jiang, Lianxin
Mo, Yang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7037 - 7041
[46] CGMV-EGR: A multimodal fusion framework for electromyographic gesture recognition
Wang, Weihao
Liu, Yan
Song, Fanghao
Lu, Jingyu
Qu, Jianing
Guo, Junqing
Huang, Jinming
PATTERN RECOGNITION, 2025, 162
[47] Spatio-Temporal Dynamic Attention Graph Convolutional Network Based on Skeleton Gesture Recognition
Han, Xiaowei
Cui, Ying
Chen, Xingyu
Lu, Yunjing
Hu, Wen
ELECTRONICS, 2024, 13 (18)
[48] ConViViT - A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition
Dokkar, Rachid Reda
Chaieb, Faten
Drira, Hassen
Aberkane, Arezki
2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
[49] Bi-attention Modal Separation Network for Multimodal Video Fusion
Du, Pengfei
Gao, Yali
Li, Xiaoyong
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 585 - 598
[50] Improvement of Dynamic Hand Gesture Recognition Based on HMM Algorithm
Zhang, Xu-Hui
Wang, Jun-Jie
Wang, Xu
Ma, Xian-Li
2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 401 - 406

← 1 2 3 4 5 →