Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
|
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 50 条
  • [41] Efficient Gait Recognition via Spatial-Temporal Decoupled Network
    Tang, Peisen
    Su, Han
    Gao, Ruixuan
    Zhao, Wensheng
    Tang, Chaoying
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [42] A Mix Fusion Spatial-Temporal Network for Facial Expression Recognition
    Shu, Chang
    Xue, Feng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 315 - 326
  • [43] Convolution spatial-temporal attention network for EEG emotion recognition
    Cao, Lei
    Yu, Binlong
    Dong, Yilin
    Liu, Tianyu
    Li, Jie
    PHYSIOLOGICAL MEASUREMENT, 2024, 45 (12)
  • [44] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [45] Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
    Du, Wenbin
    Wang, Yali
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (03) : 1347 - 1360
  • [46] Action Recognition Using a Spatial-Temporal Network for Wild Felines
    Feng, Liqi
    Zhao, Yaqin
    Sun, Yichao
    Zhao, Wenxuan
    Tang, Jiaxi
    ANIMALS, 2021, 11 (02): : 1 - 18
  • [47] Self-Emphasizing Network for Continuous Sign Language Recognition
    Hu, Lianyu
    Gao, Liqing
    Liu, Zekang
    Feng, Wei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 854 - 862
  • [48] Dynamical semantic enhancement network for continuous sign language recognition
    Wang, Suyang
    Guo, Leming
    Xue, Wanli
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [49] Selfie Continuous Sign Language Recognition using Neural Network
    Kumar, D. Anil
    Kishore, P. V. V.
    Sastry, A. S. C. S.
    Swamy, P. Reddy Gurunatha
    2016 IEEE ANNUAL INDIA CONFERENCE (INDICON), 2016,
  • [50] Distilling Cross-Temporal Contexts for Continuous Sign Language Recognition
    Guo, Leming
    Xue, Wanli
    Guo, Qing
    Liu, Bo
    Zhang, Kaihua
    Yuan, Tiantian
    Chen, Shengyong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10771 - 10780