Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

被引:0
|
作者
Zhou, Hao [1 ]
Zhou, Wengang [1 ]
Zhou, Yun [1 ]
Li, Hougiang [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and intercue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.
引用
收藏
页码:13009 / 13016
页数:8
相关论文
共 50 条
  • [41] Spatial-Temporal Recurrent Neural Network for Emotion Recognition
    Zhang, Tong
    Zheng, Wenming
    Cui, Zhen
    Zong, Yuan
    Li, Yang
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (03) : 839 - 847
  • [42] Spatial-Temporal Interleaved Network for Efficient Action Recognition
    Jiang, Shengqin
    Zhang, Haokui
    Qi, Yuankai
    Liu, Qingshan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2025, 21 (01) : 178 - 187
  • [43] Iterative Alignment Network for Continuous Sign Language Recognition
    Pu, Junfu
    Zhou, Wengang
    Li, Houqiang
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4160 - 4169
  • [44] Multi-View Gait Recognition Based on a Spatial-Temporal Deep Neural Network
    Tong, Suibing
    Fu, Yuzhuo
    Yue, Xinwei
    Ling, Hefei
    IEEE ACCESS, 2018, 6 : 57583 - 57596
  • [45] Rethinking the temporal downsampling paradigm for continuous sign language recognition
    Liu, Caifeng
    Hu, Lianyu
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [46] Multi-scale context-aware network for continuous sign language recognition
    XUE, Senhua
    GAO, Liqing
    WAN, Liang
    FENG, Wei
    Virtual Reality and Intelligent Hardware, 2024, 6 (04): : 323 - 337
  • [47] Multi-scale context-aware network for continuous sign language recognition
    Senhua XUE
    Liqing GAO
    Liang WAN
    Wei FENG
    虚拟现实与智能硬件(中英文), 2024, 6 (04) : 323 - 337
  • [48] Multi-scale local-temporal similarity fusion for continuous sign language recognition
    Xie, Pan
    Cui, Zhi
    Du, Yao
    Zhao, Mengyi
    Cui, Jianwei
    Wang, Bin
    Hu, Xiaohui
    PATTERN RECOGNITION, 2023, 136
  • [49] Deep Learning Based Modulation Recognition With Multi-Cue Fusion
    Wang, Tuo
    Hou, Yonghong
    Zhang, Haoyuan
    Guo, Zihui
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2021, 10 (08) : 1757 - 1760
  • [50] Multi-cue based moving hand segmentation for gesture recognition
    Lin J.
    Ruan X.
    Yu N.
    Cai J.
    Automatic Control and Computer Sciences, 1600, Springer Science and Business Media, LLC (51): : 193 - 203