Multimodal fusion hierarchical self-attention network for dynamic hand gesture recognition

被引：8

作者：

Balaji, Pranav ^{[1
]}

Prusty, Manas Ranjan ^{[2
]}

机构：

[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, India

[2] Vellore Inst Technol, Ctr Cyber Phys Syst, Chennai, India

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2024年 / 98卷

关键词：

Dynamic hand gesture recognition; Multimodal fusion; Cross; -attention; Transformer; SHREC '17 track dataset;

D O I：

10.1016/j.jvcir.2023.104019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent improvements in dynamic hand gesture recognition have seen a shift from traditional convolutional architectures to attention-based networks. These attention networks have been proven to outclass CNN + LSTM architectures, showing higher accuracy as well as reduced model parameters. Especially, skeleton-based attention networks have been shown to outperform visual-based networks due to the rich information from skeletonbased hand features. However, there is an opportunity to introduce complementary features from other modalities like RGB, depth, and optical flow images to enhance the recognition capability of skeleton-based networks. This paper aims to explore the addition of a multimodal fusion network to a skeleton-based Hierarchical Self-Attention Network (MF-HAN) and test for increased model effectiveness. Unlike traditional fusion techniques, this fusion network uses features derived from other sources of multimodal data in a reduced feature space using a cross-attention layer. The model outperforms its root model and other state-of-the-art models on the SHREC'17 track dataset, especially in the 28 gestures setting by more than 1 % in gesture classification accuracy. The experimentation was tested on the DHG dataset as well.

引用

页数：11

共 50 条

[31] Online Dynamic Hand Gesture Recognition with Multiple Cues
Zhao, Ying
Yan, Jiayong
2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 219 - 223
[32] Variational Self-attention Network for Sequential Recommendation
Zhao, Jing
Zhao, Pengpeng
Zhao, Lei
Liu, Yanchi
Sheng, Victor S.
Zhou, Xiaofang
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1559 - 1570
[33] A Speech Recognition Model Building Method Combined Dynamic Convolution and Multi-Head Self-Attention Mechanism
Liu, Wei
Sun, Jiaming
Sun, Yiming
Chen, Chunyi
ELECTRONICS, 2022, 11 (10)
[34] MAFN: multi-level attention fusion network for multimodal named entity recognition
Xiaoying Zhou
Yijia Zhang
Zhuang Wang
Mingyu Lu
Xiaoxia Liu
Multimedia Tools and Applications, 2024, 83 : 45047 - 45058
[35] Dynamic hand gesture recognition using hierarchical dynamic Bayesian networks through low-level image processing
Wang, Wei-Hua Andrew
Tung, Chun-Liang
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 3247 - +
[36] MAFN: multi-level attention fusion network for multimodal named entity recognition
Zhou, Xiaoying
Zhang, Yijia
Wang, Zhuang
Lu, Mingyu
Liu, Xiaoxia
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45047 - 45058
[37] Attention fusion network for multimodal sentiment analysis
Yuanyi Luo
Rui Wu
Jiafeng Liu
Xianglong Tang
Multimedia Tools and Applications, 2024, 83 : 8207 - 8217
[38] BEATS: Bengali Speech Acts Recognition using Multimodal Attention Fusion
Deb, Ahana
Nag, Sayan
Mahapatra, Ayan
Chattopadhyay, Soumitri
Marik, Aritra
Gayen, Pijush Kanti
Sanyal, Shankha
Banerjee, Archi
Karmakar, Samir
INTERSPEECH 2023, 2023, : 3392 - 3396
[39] ON THE USEFULNESS OF SELF-ATTENTION FOR AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMERS
Zhang, Shucong
Loweimi, Erfan
Bell, Peter
Renals, Steve
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 89 - 96
[40] Dynamic hand gesture recognition based on SURF tracking
Bao J.
Song A.
Guo Y.
Tang H.
Jiqiren/Robot, 2011, 33 (04): : 482 - 489

← 1 2 3 4 5 →