Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition

被引:0
|
作者
Zhou, Hao [1 ]
Zhou, Wengang [1 ]
Zhou, Yun [1 ]
Li, Hougiang [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab GIPAS, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent success of deep learning in continuous sign language recognition (CSLR), deep models typically focus on the most discriminative features, ignoring other potentially non-trivial and informative contents. Such characteristic heavily constrains their capability to learn implicit visual grammars behind the collaboration of different visual cues (i,e., hand shape, facial expression and body posture). By injecting multi-cue learning into neural network design, we propose a spatial-temporal multi-cue (STMC) network to solve the vision-based sequence learning problem. Our STMC network consists of a spatial multi-cue (SMC) module and a temporal multi-cue (TMC) module. The SMC module is dedicated to spatial representation and explicitly decomposes visual features of different cues with the aid of a self-contained pose estimation branch. The TMC module models temporal correlations along two parallel paths, i.e., intra-cue and intercue, which aims to preserve the uniqueness and explore the collaboration of multiple cues. Finally, we design a joint optimization strategy to achieve the end-to-end sequence learning of the STMC network. To validate the effectiveness, we perform experiments on three large-scale CSLR benchmarks: PHOENIX-2014, CSL and PHOENIX-2014-T. Experimental results demonstrate that the proposed method achieves new state-of-the-art performance on all three benchmarks.
引用
收藏
页码:13009 / 13016
页数:8
相关论文
共 50 条
  • [21] Multi-cue Discriminative Place Recognition
    Xing, Li
    Pronobis, Andrzej
    MULTILINGUAL INFORMATION ACCESS EVALUATION II: MULTIMEDIA EXPERIMENTS, PT II, 2010, 6242 : 315 - 323
  • [22] Multi-Branch Spatial-Temporal Network for Action Recognition
    Wang, Yingying
    Li, Wei
    Tao, Ran
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1556 - 1560
  • [23] Spatial-Temporal Transformer Network for Continuous Action Recognition in Industrial Assembly
    Huang, Jianfeng
    Liu, Xiang
    Hu, Huan
    Tang, Shanghua
    Li, Chenyang
    Zhao, Shaoan
    Lin, Yimin
    Wang, Kai
    Liu, Zhaoxiang
    Lian, Shiguo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT X, ICIC 2024, 2024, 14871 : 114 - 130
  • [24] Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition
    Zhao, Weichao
    Zhou, Wengang
    Hu, Hezhen
    Wang, Min
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4188 - 4201
  • [25] Multi-cue fusion for emotion recognition in the wild
    Yan, Jingwei
    Zheng, Wenming
    Cui, Zhen
    Tang, Chuangao
    Zhang, Tong
    Zong, Yuan
    NEUROCOMPUTING, 2018, 309 : 27 - 35
  • [26] Towards Multi-Cue Urban Curb Recognition
    Enzweiler, Markus
    Greiner, Pierre
    Knoeppel, Carsten
    Franke, Uwe
    2013 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2013, : 902 - 907
  • [27] Continuous Sign Language Recognition Via Temporal Super-Resolution Network
    Zhu, Qidan
    Li, Jing
    Yuan, Fei
    Gan, Quan
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10697 - 10711
  • [28] Dual-stage temporal perception network for continuous sign language recognition
    Huang, Zhigang
    Xue, Wanli
    Zhou, Yuxi
    Sun, Jinlu
    Wu, Yazhou
    Yuan, Tiantian
    Chen, Shengyong
    VISUAL COMPUTER, 2025, 41 (03): : 1971 - 1986
  • [29] Continuous Sign Language Recognition Via Temporal Super-Resolution Network
    Qidan Zhu
    Jing Li
    Fei Yuan
    Quan Gan
    Arabian Journal for Science and Engineering, 2023, 48 : 10697 - 10711
  • [30] Multi-cue based 3D residual network for action recognition
    Zong, Ming
    Wang, Ruili
    Chen, Zhe
    Wang, Maoli
    Wang, Xun
    Potgieter, Johan
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10): : 5167 - 5181