Evaluation of hidden Markov models using deep CNN features in isolated sign recognition

被引:7
作者
Tur, Anil Osman [1 ]
Keles, Hacer Yalim [1 ]
机构
[1] Ankara Univ, Comp Engn Dept, Ankara, Turkey
关键词
Isolated sign recognition; Gesture recognition; CNN; LSTM; HMM; GMM-HMM; Deep learning; LANGUAGE RECOGNITION;
D O I
10.1007/s11042-021-10593-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem.
引用
收藏
页码:19137 / 19155
页数:19
相关论文
共 34 条
  • [21] Murakami K., 1991, Human Factors in Computing Systems. Reaching Through Technology. CHI '91. Conference Proceedings, P237, DOI 10.1145/108844.108900
  • [22] ModDrop: Adaptive Multi-Modal Gesture Recognition
    Neverova, Natalia
    Wolf, Christian
    Taylor, Graham
    Nebout, Florian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1692 - 1706
  • [23] Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network
    Nishida, Noriki
    Nakayama, Hideki
    [J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 682 - 694
  • [24] Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition
    Nunez, Juan C.
    Cabido, Raul
    Pantrigo, Juan J.
    Montemayor, Antonio S.
    Velez, Jose F.
    [J]. PATTERN RECOGNITION, 2018, 76 : 80 - 94
  • [25] Paszke A., 2017, NIPS W
  • [26] Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
    Pigou, Lionel
    van den Oord, Aaron
    Dieleman, Sander
    Van Herreweghe, Mieke
    Dambre, Joni
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) : 430 - 439
  • [27] Sign Language Recognition Using Convolutional Neural Networks
    Pigou, Lionel
    Dieleman, Sander
    Kindermans, Pieter-Jan
    Schrauwen, Benjamin
    [J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 572 - 578
  • [28] Recent methods and databases in vision-based hand gesture recognition: A review
    Pisharady, Pramod Kumar
    Saerbeck, Martin
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 141 : 152 - 165
  • [29] ImageNet Large Scale Visual Recognition Challenge
    Russakovsky, Olga
    Deng, Jia
    Su, Hao
    Krause, Jonathan
    Satheesh, Sanjeev
    Ma, Sean
    Huang, Zhiheng
    Karpathy, Andrej
    Khosla, Aditya
    Bernstein, Michael
    Berg, Alexander C.
    Fei-Fei, Li
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
  • [30] Schreiber J, 2018, ARXIV171100131711001