Neural Sign Language Translation Based on Human Keypoint Estimation

被引：108

作者：

Ko, Sang-Ki ^{[1
]}

Kim, Chang Jo ^{[1
]}

Jung, Hyedong ^{[1
]}

Cho, Choongsang ^{[1
]}

机构：

[1] Korea Elect Technol Inst, Seongnam 13488, South Korea

来源：

APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 13期

关键词：

sign language translation; human keypoint detection; deep learning; sequence-to-sequence model;

D O I：

10.3390/app9132683

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance.

引用

页数：19

共 59 条

[1]

[Anonymous], PROC CVPR IEEE

[2]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[3]

[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.322

[4]

[Anonymous], P INT S COMP VIS

[5]

[Anonymous], 2015, PROC CVPR IEEE

[6]

[Anonymous], 2012, P 8 INT C LANG RES E

[7]

[Anonymous], 2016, ARXIV160706450

[8]

[Anonymous], P 9 ACM INT C PERVAS

[9]

[Anonymous], 2015, Proceedings of the IEEE conference on computer vision and pattern recognition workshops

[10]

[Anonymous], 2015, COMPUTER SCI

← 1 2 3 4 5 6 →