One-shot learning hand gesture recognition based on modified 3d convolutional neural networks

被引:2
作者
Zhi Lu
Shiyin Qin
Xiaojie Li
Lianwei Li
Dinghao Zhang
机构
[1] Beihang University,School of Automation Science and Electrical Engineering
来源
Machine Vision and Applications | 2019年 / 30卷
关键词
One-shot learning hand gesture recognition; Convolutional neural networks (CNN); Multimodal feature fusion; Continuous fine-tune; Transfer learning;
D O I
暂无
中图分类号
学科分类号
摘要
Though deep neural networks have played a very important role in the field of vision-based hand gesture recognition, however, it is challenging to acquire large numbers of annotated samples to support its deep learning or training. Furthermore, in practical applications it often encounters some case with only one single sample for a new gesture class so that conventional recognition method cannot be qualified with a satisfactory classification performance. In this paper, the methodology of transfer learning is employed to build an effective network architecture of one-shot learning so as to deal with such intractable problem. Then some useful knowledge from deep training with big dataset of relative objects can be transferred and utilized to strengthen one-shot learning hand gesture recognition (OSLHGR) rather than to train a network from scratch. According to this idea a well-designed convolutional network architecture with deeper layers, C3D (Tran et al. in: ICCV, pp 4489–4497, 2015), is modified as an effective tool to extract spatiotemporal feature by deep learning. Then continuous fine-tune training is performed on a sample of new classes to complete one-shot learning. Moreover, the test of classification is carried out by Softmax classifier and geometrical classification based on Euclidean distance. Finally, a series of experiments and tests on two benchmark datasets, VIVA (Vision for Intelligent Vehicles and Applications) and SKIG (Sheffield Kinect Gesture) are conducted to demonstrate its state-of-the-art recognition accuracy of our proposed method. Meanwhile, a special dataset of gestures, BSG, is built using SoftKinetic DS325 for the test of OSLHGR, and a series of test results verify and validate its well classification performance and real-time response speed.
引用
收藏
页码:1157 / 1180
页数:23
相关论文
共 87 条
  • [1] Mitra S(2007)Gesture recognition: a survey IEEE Trans. Syst. Man Cybern. Part C 37 311-324
  • [2] Acharya T(2015)Vision based hand gesture recognition for human computer interaction: a survey Artif. Intell. Rev. 43 1-54
  • [3] Rautaray SS(2013)Developing a gesture based remote human-robot interaction system using Kinect Int. J. Smart Home 7 203-208
  • [4] Agrawal A(1998)Real-time american sign language recognition using desk and wearable computer based video IEEE Trans. Pattern Anal. Mach. Intell. 20 1371-1375
  • [5] Qian K(2013)Video surveillance: past, present, and now the future IEEE Signal Process. Mag. 30 190-198
  • [6] Niu J(2006)One-shot learning of object categories IEEE Trans. Pattern Anal. Mach. Intell. 28 594-611
  • [7] Yang H(2013)3D convolutional neural networks for human action recognition PAMI 35 221-231
  • [8] Weaver J(2014)The chalearn gesture dataset (CGD 2011) Mach. Vis. Appl. 25 1929-1951
  • [9] Starner T(2013)One-shot learning gesture recognition from RGB-D data using bag of features J. Mach. Learn. Res. 14 2549-2582
  • [10] Pentland A(2014)3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos J. Electron. Imaging 23 1709-1717