Multimodal Fusion Framework Based on Statistical Attention and Contrastive Attention for Sign Language Recognition

被引:17
作者
Zhang, Jiangtao [1 ]
Wang, Qingshan [1 ]
Wang, Qi [1 ]
Zheng, Zhiwen [1 ]
机构
[1] Hefei Univ Technol, Sch Math, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Gesture recognition; Assistive technologies; Feature extraction; Skeleton; Hidden Markov models; Motion detection; Robot sensing systems; Sign language recognition; wearable computing; multimodal fusion; sEMG; deep learning; LAPLACIAN OPERATOR; FIELD; SHAPE;
D O I
10.1109/TMC.2023.3235935
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sign language recognition (SLR) enables hearing-impaired people to better communicate with able-bodied individuals. The diversity of multiple modalities can be utilized to improve SLR. However, existing multimodal fusion methods do not take into account multimodal interrelationships in-depth. This paper proposes SeeSign: a multimodal fusion framework based on statistical attention and contrastive attention for SLR. The designed two attention mechanisms are used to investigate intra-modal and inter-modal correlations of surface Electromyography (sEMG) and inertial measurement unit (IMU) signals, and fuse the two modalities. Statistical attention uses the Laplace operator and lower quantile to select and enhance active features within each modal feature clip. Contrastive attention calculates the information gain of active features in a couple of enhanced feature clips located at the same position in two modalities. The enhanced feature clips are then fused in their positions based on the gain. The fused multimodal features are fed into a Transformer-based network with connectionist temporal classification and cross-entropy losses for SLR. The experimental results show that SeeSign has accuracy of 93.17% for isolated words, and word error rates of 18.34% and 22.08% on one-handed and two-handed sign language datasets, respectively. Moreover, it outperforms state-of-the-art methods in terms of accuracy and robustness.
引用
收藏
页码:1431 / 1443
页数:13
相关论文
共 54 条
  • [1] Ananthanarayana T, 2021, IEEE INT CONF AUTOMA
  • [2] Boukhechba M, 2019, SMART HLTH, V14, P100082, DOI 10.1016/j.smhl.2019.100082
  • [3] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Bowden, Richard
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
  • [4] CAPHD CDPF and NALBRC, 2019, National dictionary of general sign language
  • [5] Hand Gesture Recognition based on Surface Electromyography using Convolutional Neural Network with Transfer Learning Method
    Chen, Xiang
    Li, Yu
    Hu, Ruochen
    Zhang, Xu
    Chen, Xun
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (04) : 1292 - 1304
  • [6] Homogenization of temperature field and temperature gradient field
    Cheng XueTao
    Xu XiangHua
    Liang XinGang
    [J]. SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2009, 52 (10): : 2937 - 2942
  • [7] Cooper H, 2017, SPRING SER CHALLENGE, P89, DOI 10.1007/978-3-319-57021-1_3
  • [8] Diebel J., 2006, MATRIX, V58, P1
  • [9] Dong C, 2015, IEEE COMPUT SOC CONF
  • [10] Ekiz D, 2017, SIG PROCESS COMMUN