Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
|
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 50 条
  • [21] Continuous Sign Language Recognition Via Temporal Super-Resolution Network
    Qidan Zhu
    Jing Li
    Fei Yuan
    Quan Gan
    Arabian Journal for Science and Engineering, 2023, 48 : 10697 - 10711
  • [22] Difference-guided multi-scale spatial-temporal representation for sign language recognition
    Gao, Liqing
    Hu, Lianyu
    Lyu, Fan
    Zhu, Lei
    Wan, Liang
    Pun, Chi-Man
    Feng, Wei
    VISUAL COMPUTER, 2023, 39 (08): : 3417 - 3428
  • [23] Difference-guided multi-scale spatial-temporal representation for sign language recognition
    Liqing Gao
    Lianyu Hu
    Fan Lyu
    Lei Zhu
    Liang Wan
    Chi-Man Pun
    Wei Feng
    The Visual Computer, 2023, 39 : 3417 - 3428
  • [24] Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks
    Vazquez-Enriquez, Manuel
    Alba-Castro, Jose L.
    Docio-Fernandez, Laura
    Rodriguez-Banga, Eduardo
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3457 - 3466
  • [25] SLOWFAST NETWORK FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
    Ahn, Junseok
    Jang, Youngjoon
    Chung, Joon Son
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3920 - 3924
  • [26] Continuous Sign Language Recognition with Correlation Network
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2529 - 2539
  • [27] Continuous Sign Language Recognition with Correlation Network
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    arXiv, 2023,
  • [28] Spatial-Temporal Consistency Constraints for Chinese Sign Language Synthesis
    Gao, Liqing
    Liu, Peidong
    Wan, Liang
    Feng, Wei
    COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS, CAD/GRAPHICS 2023, 2024, 14250 : 154 - 169
  • [29] Spatial-temporal feature-based End-to-end Fourier network for 3D sign language recognition
    Abdullahi, Sunusi Bala
    Chamnongthai, Kosin
    Bolon-Canedo, Veronica
    Cancela, Brais
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [30] Temporal Lift Pooling for Continuous Sign Language Recognition
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 511 - 527