Evaluation of hidden Markov models using deep CNN features in isolated sign recognition

被引:7
作者
Tur, Anil Osman [1 ]
Keles, Hacer Yalim [1 ]
机构
[1] Ankara Univ, Comp Engn Dept, Ankara, Turkey
关键词
Isolated sign recognition; Gesture recognition; CNN; LSTM; HMM; GMM-HMM; Deep learning; LANGUAGE RECOGNITION;
D O I
10.1007/s11042-021-10593-w
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem.
引用
收藏
页码:19137 / 19155
页数:19
相关论文
共 34 条
  • [1] [Anonymous], 2005, P 22 INT C MACHINE L, DOI DOI 10.1145/1102351.1102422
  • [2] A review of hand gesture and sign language recognition techniques
    Cheok, Ming Jin
    Omar, Zaid
    Jaward, Mohamed Hisham
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) : 131 - 153
  • [3] Combrink JH, 2018, THESIS U CAPE TOWN
  • [4] Cooper H, 2012, J MACH LEARN RES, V13, P2205
  • [5] Histograms of oriented gradients for human detection
    Dalal, N
    Triggs, B
    [J]. 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 886 - 893
  • [6] Dynamic gesture recognition by using CNNs and star RGB: A temporal information condensation
    dos Santos, Clebeson Canuto
    Aching Samatelo, Jorge Leonid
    Vassallo, Raquel Frizera
    [J]. NEUROCOMPUTING, 2020, 400 : 238 - 254
  • [7] Escalera S, 2017, SPRING SER CHALLENGE, P1, DOI 10.1007/978-3-319-57021-1_1
  • [8] Multi-modal Gesture Recognition Challenge 2013: Dataset and Results
    Escalera, Sergio
    Gonzalez, Jordi
    Baro, Xavier
    Reyes, Miguel
    Lopes, Oscar
    Guyon, Isabelle
    Athitsos, Vassilis
    Escalante, Hugo J.
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 445 - 452
  • [9] ChaLearn Looking at People Challenge 2014: Dataset and Results
    Escalera, Sergio
    Baro, Xavier
    Gonzalez, Jordi
    Bautista, Miguel A.
    Madadi, Meysam
    Reyes, Miguel
    Ponce-Lopez, Victor
    Escalante, Hugo J.
    Shotton, Jamie
    Guyon, Isabelle
    [J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 459 - 473
  • [10] VITERBI ALGORITHM
    FORNEY, GD
    [J]. PROCEEDINGS OF THE IEEE, 1973, 61 (03) : 268 - 278