HandSense: smart multimodal hand gesture recognition based on deep neural networks

被引:14
作者
Zhang Z. [1 ]
Tian Z. [1 ]
Zhou M. [1 ]
机构
[1] Chongqing University of Posts and Telecommunications, Chongqing
基金
中国国家自然科学基金;
关键词
Fine-grained gestures; Hand gesture recognition; HandSense; Spatial–temporal features;
D O I
10.1007/s12652-018-0989-7
中图分类号
学科分类号
摘要
Hand gesture recognition (HGR) is a promising enabler for human–computer interaction (HCI). Hand gestures are normally classified into multi-modal actions, including static gestures, fine-grained dynamic gestures, and coarse-grained dynamic gestures. Among them, the fine-grained action detection is limited under the small-scale image region condition. To solve this problem, we propose the HandSense, a new system for the multi-modal HGR based on a combined RGB and depth cameras to improve the fine-grained action descriptors as well as preserve the ability to perform general action recognition. First of all, two interconnected 3D convolutional neural network (3D-CNN) are employed to extract the spatial–temporal features from the RGB and depth images. Second, these spatial–temporal features are integrated into a fusion feature. Finally, the Support Vector Machine (SVM) is used to recognize different gestures based on the fusion feature. To validate the effectiveness of the HandSense, the extensive experiments are conducted on the public gesture dataset, namely the SKIG hand gesture dataset. In addition, the feasibility of the proposed system is also demonstrated by using a challenging multi-modal RGB-Depth hand gesture dataset. © Springer-Verlag GmbH Germany, part of Springer Nature 2018.
引用
收藏
页码:1557 / 1572
页数:15
相关论文
共 40 条
[1]  
Baccouche M., Mamalet F., Wolf C., Garcia C., Baskurt A., Sequential deep learning for human action recognition, International Workshop on Human Behavior Understanding. Springer, pp. 29-39, (2011)
[2]  
Chang C.C., Lin C.J., Libsvm: a library for support vector machines, ACM Trans Intel Syst Technol (TIST), 2, 3, (2011)
[3]  
Chung S., Park C., Suh S., Kang K., Choo J., Kwon B.C., Re-vacnn: Steering convolutional neural network via real-time visual analytics, Future of Interactive Learning Machines Workshop at the 30Th Annual Conference on Neural Information Processing Systems (NIPS), (2016)
[4]  
Ge L., Liang H., Yuan J., Thalmann D., Robust 3d hand pose estimation in single depth images: From single-view cnn to multi-view cnns, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3593-3601, (2016)
[5]  
Hinton G.E., Srivastava N., Krizhevsky A., Sutskever I., Salakhutdinov R.R., Improving neural networks by preventing co-adaptation of feature detectors, Arxiv Preprint Arxiv:12070580, (2012)
[6]  
Hu M., Shen F., Zhao J., Hidden markov models based dynamic hand gesture recognition with incremental learning method, 2014 International Joint Conference on Neural Networks (IJCNN), pp. 3108-3115, (2014)
[7]  
Jahn G., Krems J.F., Gelau C., Skill acquisition while operating in-vehicle information systems: interface design determines the level of safety-relevant distractions, Hum Factors, 51, 2, pp. 136-151, (2009)
[8]  
Ji S., Xu W., Yang M., Yu K., 3d convolutional neural networks for human action recognition, IEEE Trans Pattern Anal Mach Intell, 35, 1, pp. 221-231, (2013)
[9]  
Joachims T., Optimizing search engines using clickthrough data, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, (2002)
[10]  
Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R., Fei-Fei L., Large-scale video classification with convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725-1732, (2014)