Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

被引：196

作者：

Zhu G. ^{[1
]}

Zhang L. ^{[1
]}

Shen P. ^{[1
]}

Song J. ^{[1
]}

机构：

[1] School of Software, Xidian University, Xi'an

来源：

IEEE Access | 2017年 / 5卷

关键词：

3-D convolution; convolutional LSTM; gesture recognition; multimodal;

D O I：

10.1109/ACCESS.2017.2684186

中图分类号：

学科分类号：

摘要：

Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) networks. The proposed method first learns short-term spatiotemporal features of gestures through the 3-D convolutional neural network, and then learns long-term spatiotemporal features by convolutional LSTM networks based on the extracted short-term spatiotemporal features. In addition, fine-tuning among multimodal data is evaluated, and we find that it can be considered as an optional skill to prevent overfitting when no pre-trained models exist. The proposed method is verified on the ChaLearn LAP large-scale isolated gesture data set (IsoGD) and the Sheffield Kinect gesture (SKIG) data set. The results show that our proposed method can obtain the state-of-the-art recognition accuracy (51.02% on the validation set of IsoGD and 98.89% on SKIG). © 2017 IEEE.

引用

页码：4517 / 4524

页数：7

共 44 条

[1]

Wan J., Li S.Z., Zhao Y., Zhou S., Guyon I., Escalera S., ChaLearnlooking at people RGB-D isolated and continuous datasets for gesturerecognition, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 56-64, (2016)

[2]

Liu L., Shao L., Learning discriminative representations fromRGB-D video data, Proc. 23rd Int. Joint Conf. Artif. Intell., pp. 1493-1500, (2013)

[3]

Choi H., Park H., Ahierarchical structure for gesture recognition usingRGB-D sensor, Proc. 2nd Int. Conf. Human-Agent Interact., pp. 265-268, (2014)

[4]

Wang P., Li W., Liu S., Gao Z., Tang C., Ogunbona P., Large-scaleisolated gesture recognition using convolutional neural networks, Proc. 23rd Int. Conf. Pattern Recognit. (ICPR), pp. 7-12, (2016)

[5]

Mitra S., Acharya T., Gesture recognition: A survey, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., 37, 3, pp. 311-324, (2007)

[6]

Escalera S., Athitsos V., Guyon I., Challenges in multimodal gesturerecognition, J. Mach. Learn. Res., 17, 2, pp. 1-54, (2016)

[7]

Zhu G., Zhang L., Mei L., Shao J., Song J., Shen P., Large-scaleisolated gesture recognition using pyramidal 3D convolutional networks, Proc. 23rd Int. Conf. Pattern Recognit. (ICPR), pp. 19-24, (2016)

[8]

Cirujeda P., Binefa X., 4DCov: A nested covariance descriptor ofspatio-temporal features for gesture recognition in depth sequences, Proc. 2nd Int. Conf. 3D Vis., pp. 657-664, (2014)

[9]

Liu M., Liu H., Depth context: A new descriptor for human activityrecognition by using sole depth sequences, Neurocomputing, 175, pp. 747-758, (2016)

[10]

LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, pp. 436-444, (2015)

← 1 2 3 4 5 →