Spatial-Temporal Enhanced Network for Continuous Sign Language Recognition

被引:2
作者
Yin, Wenjie [1 ]
Hou, Yonghong [1 ]
Guo, Zihui [1 ]
Liu, Kailin [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Feature extraction; Videos; Assistive technologies; Visualization; Gesture recognition; Data mining; Task analysis; Continuous sign language recognition; soft dynamic time warping; temporal difference; sequence learning; RECURRENT NEURAL-NETWORK; FRAMEWORK;
D O I
10.1109/TCSVT.2023.3296668
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Continuous Sign language Recognition (CSLR) aims to generate gloss sequences based on untrimmed sign videos. Since discriminative visual features are essential for CSLR, current efforts mainly focus on strengthening the feature extractor. The feature extractor can be disassembled into a spatial representation module and a short-term temporal module for spatial and visual features modeling. However, existing methods always regard it as a monoblock and rarely implement specific refinements for such two distinct modules, which is difficult to achieve effective modeling of spatial appearance information and temporal motion information. To address the above issues, we proposed a spatial temporal enhanced network which contains a spatial-visual alignment (SVA) module and a temporal feature difference (TFD) module. Specifically, the SVA module conducts an auxiliary task between the spatial features and target gloss sequences to enhance the extraction of hand and facial expressions. Meanwhile, the TFD module is constructed to exploit the underlying dynamic between consecutive frames and inject the aggregated motion information into spatial features to assist short-term temporal modeling. Extensive experimental results demonstrate the effectiveness of the proposed modules and our network achieves state-of-the-art or competitive performance on four public CSLR datasets.
引用
收藏
页码:1684 / 1695
页数:12
相关论文
共 64 条
  • [1] A Comprehensive Study on Deep Learning-Based Methods for Sign Language Recognition
    Adaloglou, Nikolas
    Chatzis, Theocharis
    Papastratis, Ilias
    Stergioulas, Andreas
    Papadopoulos, Georgios Th.
    Zacharopoulou, Vassia
    Xydopoulos, George J.
    Atzakas, Klimnis
    Papazachariou, Dimitris
    Daras, Petros
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1750 - 1762
  • [2] Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective
    Bragg, Danielle
    Koller, Oscar
    Bellard, Mary
    Berke, Larwan
    Boudreault, Patrick
    Braffort, Annelies
    Caselli, Naomi
    Huenerfauth, Matt
    Kacorri, Hernisa
    Verhoef, Tessa
    Vogler, Christian
    Morris, Meredith Ringel
    [J]. ASSETS'19: THE 21ST INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2019, : 16 - 31
  • [3] SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Bowden, Richard
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3075 - 3084
  • [4] Camgöz NC, 2020, PROC CVPR IEEE, P10020, DOI 10.1109/CVPR42600.2020.01004
  • [5] Neural Sign Language Translation
    Camgoz, Necati Cihan
    Hadfield, Simon
    Koller, Oscar
    Ney, Hermann
    Bowden, Richard
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7784 - 7793
  • [6] D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
    Chang, Chien-Yi
    Huang, De-An
    Sui, Yanan
    Li Fei-Fei
    Niebles, Juan Carlos
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3541 - 3550
  • [7] A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training
    Cui, Runpeng
    Liu, Hu
    Zhang, Changshui
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) : 1880 - 1891
  • [8] Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization
    Cui, Runpeng
    Liu, Hu
    Zhang, Changshui
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1610 - 1618
  • [9] Cuturi M, 2017, PR MACH LEARN RES, V70
  • [10] Sign Language Video Retrieval with Free-Form Textual Queries
    Duarte, Amanda
    Albanie, Samuel
    Giro-i-Nieto, Xavier
    Varol, Gul
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14074 - 14084