Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
|
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 50 条
  • [31] Multi-Information Spatial-Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation
    Xiao, Qinkun
    Chang, Xin
    Zhang, Xue
    Liu, Xing
    IEEE ACCESS, 2020, 8 : 216718 - 216728
  • [32] An Attention Enhanced Spatial-Temporal Graph Convolutional LSTM Network for Action Recognition in Karate
    Guo, Jianping
    Liu, Hong
    Li, Xi
    Xu, Dahong
    Zhang, Yihan
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [33] Spatial-Temporal Convolutional Attention Network for Action Recognition
    Luo, Huilan
    Chen, Han
    Computer Engineering and Applications, 2023, 59 (09): : 150 - 158
  • [34] Spatial-Temporal Recurrent Neural Network for Emotion Recognition
    Zhang, Tong
    Zheng, Wenming
    Cui, Zhen
    Zong, Yuan
    Li, Yang
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) : 839 - 847
  • [35] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
  • [36] Iterative Alignment Network for Continuous Sign Language Recognition
    Pu, Junfu
    Zhou, Wengang
    Li, Houqiang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4160 - 4169
  • [37] Rethinking the temporal downsampling paradigm for continuous sign language recognition
    Liu, Caifeng
    Hu, Lianyu
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [38] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281
  • [39] Multimodal Locally Enhanced Transformer for Continuous Sign Language Recognition
    Papadimitriou, Katerina
    Potamianos, Gerasimos
    INTERSPEECH 2023, 2023, : 1513 - 1517
  • [40] A Local Spatial-Temporal Synchronous Network to Dynamic Gesture Recognition
    Zhao, Dongdong
    Yang, Qinglian
    Zhou, Xingwen
    Li, Hongli
    Yan, Shi
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (05) : 2226 - 2233