Sign Pose-based Transformer for Word-level Sign Language Recognition

被引:77
作者
Bohacek, Matyas [1 ]
Hruz, Marek [1 ]
机构
[1] Univ West Bohemia, Fac Appl Sci, Dept Cybernet & New Technol Informat Soc, Tech 8, Plzen 30100, Czech Republic
来源
2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022) | 2022年
关键词
D O I
10.1109/WACVW54805.2022.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a system for word-level sign language recognition based on the Transformer model. We aim at a solution with low computational cost, since we see great potential in the usage of such recognition system on hand-held devices. We base the recognition on the estimation of the pose of the human body in the form of 2D landmark locations. We introduce a robust pose normalization scheme which takes the signing space in consideration and processes the hand poses in a separate local coordinate system, independent on the body pose. We show experimentally the significant impact of this normalization on the accuracy of our proposed system. We introduce several augmentations of the body pose that further improve the accuracy, including a novel sequential joint rotation augmentation. With all the systems in place, we achieve state of the art top-1 results on the WLASL and LSA64 datasets. For WLASL, we are able to successfully recognize 63.18 % of sign recordings in the 100-gloss subset, which is a relative improvement of 5 % from the prior state of the art. For the 300-gloss subset, we achieve recognition rate of 43.78 % which is a relative improvement of 3.8 %. With the LSA64 dataset, we report test recognition accuracy of 100 %.
引用
收藏
页码:182 / 191
页数:10
相关论文
共 41 条
[1]  
Adaloglou Nikolas, 2020, arXiv preprint arXiv:2007.12530
[2]  
[Anonymous], 2016, P LREC 2016
[3]  
Badhe PC, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS, VISION AND INFORMATION SECURITY (CGVIS), P195, DOI 10.1109/CGVIS.2015.7449921
[4]  
Bauer A., 2014, USE SIGNING SPACE SH, DOI 10.1515/9781614515470
[5]  
Buehler P, 2009, PROC CVPR IEEE, P2953, DOI 10.1109/CVPRW.2009.5206523
[6]   Multi-channel Transformers for Multi-articulatory Sign Language Translation [J].
Camgoz, Necati Cihan ;
Koller, Oscar ;
Hadfield, Simon ;
Bowden, Richard .
COMPUTER VISION - ECCV 2020 WORKSHOPS, PT IV, 2020, 12538 :301-319
[7]   Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [J].
Camgoz, Necati Cihan ;
Koller, Oscar ;
Hadfield, Simon ;
Bowden, Richard .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :10020-10030
[8]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[9]  
Cooper H, 2012, J MACH LEARN RES, V13, P2205
[10]   A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training [J].
Cui, Runpeng ;
Liu, Hu ;
Zhang, Changshui .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (07) :1880-1891