One-shot learning gesture recognition based on joint training of 3D ResNet and memory module

被引:12
作者
Li, Lianwei [1 ]
Qin, Shiyin [1 ,2 ]
Lu, Zhi [1 ]
Xu, Kuanhong [3 ]
Hu, Zhongying [3 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China
[2] Dongguan Univ Technol, Sch Elect Engn & Intelligentizat, Dongguan 523808, Guangdong, Peoples R China
[3] Sony China Res Lab, Artificial Intelligence Res Dept, Beijing 100028, Peoples R China
基金
中国国家自然科学基金;
关键词
Gesture recognition; One-shot learning; Joint training; 3D ResNet; Memory module; RGB-D DATA; DATASET;
D O I
10.1007/s11042-019-08429-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a research hotspot in the field of human-machine interaction, a great progress of hand gesture recognition has been achieved with the development of deep learning of neural networks. However, in the deep learning based recognition methods, it is necessary to rely heavily on large-scale labeled dataset which is very hard to build in practical applications. In order to achieve a well performance under some strict constraint of few sample data, one-shot learning gesture recognition is studied and a joint deep training method by combination of 3D ResNet with a memory module is presented in this paper. In our scheme a combinatorial optimization of feature extraction by 3D ResNet with memory capacity of rare event by memory module is carried out with an effective strategy of optimal decision and two relative performance indices. In order to implement one-shot learning gesture recognition, the memory module is employed to remember the features extracted by well-trained 3D ResNet and the classification decision is performed by the nearest neighbor algorithm with cosine similarity measure. In view of real-world applications about human-machine interaction technology, its ability to deal with negative samples plays a significant role thus a mechanism based on the threshold of cosine similarity is built to realize effective classification and rejection respectively. In order to validate and evaluate the performance of our proposed method, a special hand gesture dataset containing 3045 gesture videos is built and a series of experiment results on our collected dataset and public datasets demonstrate the feasibility and effectiveness of our method.
引用
收藏
页码:6727 / 6757
页数:31
相关论文
共 59 条
  • [1] [Anonymous], 2017, ABS170805038 CORR
  • [2] [Anonymous], ARXIV170303129
  • [3] [Anonymous], 2016, ARXIV160506065
  • [4] [Anonymous], INT C LEARNING REPRE, DOI DOI 10.1145/1830483.1830503
  • [5] [Anonymous], 2014, ICLR 15
  • [6] Bertinetto L., 2016, Advances in Neural Information Processing Systems. (NIPS), P523
  • [7] Memory Matching Networks for One-Shot Image Recognition
    Cai, Qi
    Pan, Yingwei
    Yao, Ting
    Yan, Chenggang
    Mei, Tao
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4080 - 4088
  • [8] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [9] A Bayesian approach to unsupervised one-shot learning of object categories
    Fei-Fei, L
    Fergus, R
    Perona, P
    [J]. NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, : 1134 - 1141
  • [10] Finn C, 2017, PR MACH LEARN RES, V70