Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network

被引:37
作者
Nishida, Noriki [1 ]
Nakayama, Hideki [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Machine Percept Grp, Tokyo, Japan
来源
IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015 | 2016年 / 9431卷
关键词
Multimodal gesture recognition; Recurrent neural networks; Long short-term memory; Convolutional neural networks;
D O I
10.1007/978-3-319-29451-3_54
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel method for multimodal gesture recognition based on neural networks. Our multi-stream recurrent neural network (MRNN) is a completely data-driven model that can be trained from end to end without domain-specific hand engineering. The MRNN extends recurrent neural networks with Long Short-Term Memory cells (LSTM-RNNs) that facilitate the handling of variable-length gestures. We propose a recurrent approach for fusing multiple temporal modalities using multiple streams of LSTM-RNNs. In addition, we propose alternative fusion architectures and empirically evaluate the performance and robustness of these fusion strategies. Experimental results demonstrate that the proposed MRNN outperforms other state-of-the-art methods in the Sheffield Kinect Gesture (SKIG) dataset, and has significantly high robustness to noisy inputs.
引用
收藏
页码:682 / 694
页数:13
相关论文
共 32 条
[1]  
[Anonymous], 2011, P ICML
[2]  
[Anonymous], COGN MODEL
[3]  
[Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
[4]  
[Anonymous], 2013, IEEE T PATTERN ANAL, DOI DOI 10.1109/TPAMI.2012.59
[5]  
[Anonymous], P IJCAI
[6]  
[Anonymous], 2014, P IEEE C COMPUTER VI
[7]  
[Anonymous], P ICTD
[8]  
[Anonymous], P CVPR
[9]  
[Anonymous], 2014, P CVPR
[10]  
[Anonymous], 2015, CoRR