Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
|
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 50 条
  • [1] Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Hougiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13009 - 13016
  • [2] Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network
    Guo, Qi
    Zhang, Shujun
    Li, Hui
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2023, 134 (03): : 1653 - 1670
  • [3] Spatial-Temporal Multi-Cue Network for Sign Language Recognition and Translation
    Zhou, Hao
    Zhou, Wengang
    Zhou, Yun
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 768 - 779
  • [4] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
    Wang, Zhen
    Li, Dongyuan
    Jiang, Renhe
    Okumura, Manabu
    IEEE Access, 13 : 5491 - 5506
  • [5] Continuous Sign Language Recognition With Multi-Scale Spatial-Temporal Feature Enhancement
    Wang, Zhen
    Li, Dongyuan
    Jiang, Renhe
    Okumura, Manabu
    IEEE ACCESS, 2025, 13 : 5491 - 5506
  • [6] StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition
    Shen, Xiaolong
    Zheng, Zhedong
    Yang, Yi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)
  • [7] Spatial-Temporal Graph Convolutional Networks for Sign Language Recognition
    de Amorim, Cleison Correia
    Macedo, David
    Zanchettin, Cleber
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 646 - 657
  • [8] Multiscale temporal network for continuous sign language recognition
    Zhu, Qidan
    Li, Jing
    Yuan, Fei
    Gan, Quan
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [9] STFE-Net: A Spatial-Temporal Feature Extraction Network for Continuous Sign Language Translation
    Hu, Jiwei
    Liu, Yunfei
    Lam, Kin-Man
    Lou, Ping
    IEEE ACCESS, 2023, 11 : 46204 - 46217
  • [10] Spatial-temporal attention with graph and general neural network-based sign language recognition
    Miah, Abu Saleh Musa
    Hasan, Md. Al Mehedi
    Okuyama, Yuichi
    Tomioka, Yoichi
    Shin, Jungpil
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)