Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network

被引：37

作者：

Nishida, Noriki ^{[1
]}

Nakayama, Hideki ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Machine Percept Grp, Tokyo, Japan

来源：

IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015 | 2016年 / 9431卷

关键词：

Multimodal gesture recognition; Recurrent neural networks; Long short-term memory; Convolutional neural networks;

D O I：

10.1007/978-3-319-29451-3_54

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present a novel method for multimodal gesture recognition based on neural networks. Our multi-stream recurrent neural network (MRNN) is a completely data-driven model that can be trained from end to end without domain-specific hand engineering. The MRNN extends recurrent neural networks with Long Short-Term Memory cells (LSTM-RNNs) that facilitate the handling of variable-length gestures. We propose a recurrent approach for fusing multiple temporal modalities using multiple streams of LSTM-RNNs. In addition, we propose alternative fusion architectures and empirically evaluate the performance and robustness of these fusion strategies. Experimental results demonstrate that the proposed MRNN outperforms other state-of-the-art methods in the Sheffield Kinect Gesture (SKIG) dataset, and has significantly high robustness to noisy inputs.

引用

页码：682 / 694

页数：13