Chinese Sign Language Recognition with Sequence to Sequence Learning

被引:11
作者
Mao, Chensi [1 ]
Huang, Shiliang [1 ]
Li, Xiaoxu [1 ]
Ye, Zhongfu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Dept Elect Engn & Informat Sci, Hefei 230027, Anhui, Peoples R China
来源
COMPUTER VISION, PT I | 2017年 / 771卷
关键词
Sign language recognition; Long short-term memory; Convolutional neural network; Trajectory;
D O I
10.1007/978-981-10-7299-4_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we formulate Chinese sign language recognition (SLR) as a sequence to sequence problem and propose an encoder-decoder based framework to handle it. The proposed framework is based on the convolutional neural network (CNN) and recurrent neural network (RNN) with long short-term memory (LSTM). Specifically, CNN is adopted to extract the spatial features of input frames. Two LSTM layers are cascaded to implement the structure of encoder-decoder. The encoder-decoder can not only learn the temporal information of the input features but also can learn the context model of sign language words. We feed the images sequences captured by Microsoft Kinect2.0 into the network to build an end-to-end model. Moreover, we also set up another model by using skeletal coordinates as the input of the encoder-decoder framework. In the recognition stage, a probability combination method is proposed to fuse these two models to get the final prediction. We validate our method on the self-build dataset and the experimental results demonstrate the effectiveness of the proposed method.
引用
收藏
页码:180 / 191
页数:12
相关论文
共 27 条
[1]  
[Anonymous], 2013, IJCAI
[2]  
[Anonymous], 2006, 2006 IEEE COMP VIS P
[3]  
[Anonymous], ARXIV150601911
[4]  
Barros Pablo, 2014, Artificial Neural Networks and Machine Learning - ICANN 2014. 24th International Conference on Artificial Neural Networks. Proceedings: LNCS 8681, P403, DOI 10.1007/978-3-319-11179-7_51
[5]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[6]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]  
Gao W, 2004, PATTERN RECOGN, V37, P2389, DOI [10.1016/S0031-3203(04)00165-7, 10.1016/j.patcog.2004.04.008]
[8]  
Graves A, 2014, PR MACH LEARN RES, V32, P1764
[9]  
Grobel K, 1997, IEEE SYS MAN CYBERN, P162, DOI 10.1109/ICSMC.1997.625742
[10]   Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D [J].
Hernandez-Vela, Antonio ;
Angel Bautista, Miguel ;
Perez-Sala, Xavier ;
Ponce-Lopez, Victor ;
Escalera, Sergio ;
Baro, Xavier ;
Pujol, Oriol ;
Angulo, Cecilio .
PATTERN RECOGNITION LETTERS, 2014, 50 :112-121