Evaluation of hidden Markov models using deep CNN features in isolated sign recognition

被引：7

作者：

Tur, Anil Osman ^{[1
]}

Keles, Hacer Yalim ^{[1
]}

机构：

[1] Ankara Univ, Comp Engn Dept, Ankara, Turkey

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 13期

关键词：

Isolated sign recognition; Gesture recognition; CNN; LSTM; HMM; GMM-HMM; Deep learning; LANGUAGE RECOGNITION;

D O I：

10.1007/s11042-021-10593-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem.

引用

页码：19137 / 19155

页数：19

共 34 条

[21] Murakami K., 1991, Human Factors in Computing Systems. Reaching Through Technology. CHI '91. Conference Proceedings, P237, DOI 10.1145/108844.108900
[22] ModDrop: Adaptive Multi-Modal Gesture Recognition
Neverova, Natalia
Wolf, Christian
Taylor, Graham
Nebout, Florian
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (08) : 1692 - 1706
[23] Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network
Nishida, Noriki
Nakayama, Hideki
[J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 682 - 694
[24] Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition
Nunez, Juan C.
Cabido, Raul
Pantrigo, Juan J.
Montemayor, Antonio S.
Velez, Jose F.
[J]. PATTERN RECOGNITION, 2018, 76 : 80 - 94
[25] Paszke A., 2017, NIPS W
[26] Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
Pigou, Lionel
van den Oord, Aaron
Dieleman, Sander
Van Herreweghe, Mieke
Dambre, Joni
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2018, 126 (2-4) : 430 - 439
[27] Sign Language Recognition Using Convolutional Neural Networks
Pigou, Lionel
Dieleman, Sander
Kindermans, Pieter-Jan
Schrauwen, Benjamin
[J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 572 - 578
[28] Recent methods and databases in vision-based hand gesture recognition: A review
Pisharady, Pramod Kumar
Saerbeck, Martin
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 141 : 152 - 165
[29] ImageNet Large Scale Visual Recognition Challenge
Russakovsky, Olga
Deng, Jia
Su, Hao
Krause, Jonathan
Satheesh, Sanjeev
Ma, Sean
Huang, Zhiheng
Karpathy, Andrej
Khosla, Aditya
Bernstein, Michael
Berg, Alexander C.
Fei-Fei, Li
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 115 (03) : 211 - 252
[30] Schreiber J, 2018, ARXIV171100131711001

← 1 2 3 4 →