Continuous Sign Language Recognition Based on Spatial-Temporal Graph Attention Network

被引:5
作者
Guo, Qi [1 ]
Zhang, Shujun [1 ]
Li, Hui [1 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Informat Sci & Technol, Qingdao 266061, Peoples R China
来源
CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES | 2023年 / 134卷 / 03期
关键词
Continuous sign language recognition; graph attention network; bidirectional long short-term memory; connectionist temporal classification;
D O I
10.32604/cmes.2022.021784
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Continuous sign language recognition (CSLR) is challenging due to the complexity of video background, hand gesture variability, and temporal modeling difficulties. This work proposes a CSLR method based on a spatial-temporal graph attention network to focus on essential features of video series. The method considers local details of sign language movements by taking the information on joints and bones as inputs and constructing a spatial-temporal graph to reflect inter-frame relevance and physical connections between nodes. The graph-based multi-head attention mechanism is utilized with adjacent matrix calculation for better local-feature exploration, and short-term motion correlation modeling is completed via a temporal convolutional network. We adopted BLSTM to learn the long-term dependence and connectionist temporal classification to align the word-level sequences. The proposed method achieves competitive results regarding word error rates (1.59%) on the Chinese Sign Language dataset and the mean Jaccard Index (65.78%) on the ChaLearn LAP Continuous Gesture Dataset.
引用
收藏
页码:1653 / 1670
页数:18
相关论文
共 46 条
[1]   Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [J].
Camgoz, Necati Cihan ;
Koller, Oscar ;
Hadfield, Simon ;
Bowden, Richard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10020-10030
[2]   OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields [J].
Cao, Zhe ;
Hidalgo, Gines ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) :172-186
[3]  
Chai XJ, 2016, INT C PATT RECOG, P31, DOI 10.1109/ICPR.2016.7899603
[4]   Fully Convolutional Networks for Continuous Sign Language Recognition [J].
Cheng, Ka Leong ;
Yang, Zhaoyang ;
Chen, Qifeng ;
Tai, Yu-Wing .
COMPUTER VISION - ECCV 2020, PT XXIV, 2020, 12369 :697-714
[5]   A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training [J].
Cui, Runpeng ;
Liu, Hu ;
Zhang, Changshui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) :1880-1891
[6]  
Graves A., 2006, P 23 INT C MACHINE L, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]
[7]  
Guo D, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P744
[8]   Global-Local Enhancement Network for NMF-Aware Sign Language Recognition [J].
Hu, Hezhen ;
Zhou, Wengang ;
Pu, Junfu ;
Li, Houqiang .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (03)
[9]  
Huang C., 2019, INT C ART INT ADV MA, V48, P1, DOI [10.1145/3358331.3358379, DOI 10.1145/3358331.3358379]
[10]   Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition [J].
Huang, Jie ;
Zhou, Wengang ;
Li, Houqiang ;
Li, Weiping .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (09) :2822-2832