Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 64 条
  • [31] Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection
    Liu, Jiaying
    Li, Yanghao
    Song, Sijie
    Xing, Junliang
    Lan, Cuiling
    Zeng, Wenjun
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (09) : 2667 - 2682
  • [32] SphereFace: Deep Hypersphere Embedding for Face Recognition
    Liu, Weiyang
    Wen, Yandong
    Yu, Zhiding
    Li, Ming
    Raj, Bhiksha
    Song, Le
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6738 - 6746
  • [33] Liu ZY, 2020, AAAI CONF ARTIF INTE, V34, P11669
  • [34] Visual Alignment Constraint for Continuous Sign Language Recognition
    Min, Yuecong
    Hao, Aiming
    Chai, Xiujuan
    Chen, Xilin
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11522 - 11531
  • [35] Temporal Difference Networks for Video Action Recognition
    Ng, Joe Yue-Hei
    Davis, Larry S.
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1577 - 1586
  • [36] Boosting Continuous Sign Language Recognition via Cross Modality Augmentation
    Pu, Junfu
    Zhou, Wengang
    Hu, Hezhen
    Li, Houqiang
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1497 - 1505
  • [37] Pu JF, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P885
  • [38] Iterative Alignment Network for Continuous Sign Language Recognition
    Pu, Junfu
    Zhou, Wengang
    Li, Houqiang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4160 - 4169
  • [39] Expansion-Squeeze-Excitation Fusion Network for Elderly Activity Recognition
    Shu, Xiangbo
    Yang, Jiawen
    Yan, Rui
    Song, Yan
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (08) : 5281 - 5292
  • [40] Spatiotemporal Co-Attention Recurrent Neural Networks for Human-Skeleton Motion Prediction
    Shu, Xiangbo
    Zhang, Liyan
    Qi, Guo-Jun
    Liu, Wei
    Tang, Jinhui
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 3300 - 3315