Utilizing motion and spatial features for sign language gesture recognition using cascaded CNN and LSTM models

被引：5

作者：

Luqman, Hamzah ^{[1
]}

El-Alfy, El-Sayed M. ^{[1
]}

机构：

[1] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Coll Comp & Math,Dept Informat & Comp Sci, Dhahran, Saudi Arabia

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2022年 / 30卷 / 07期

关键词：

Sign language recognition; gesture recognition; sign language translation; action recognition; Arabic sign language recognition; CNN-LSTM; DISCRETE WAVELET TRANSFORM; FEATURE-EXTRACTION; NETWORK;

D O I：

10.55730/1300-0632.3952

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sign language is a language produced by body parts gestures and facial expressions. The aim of an automatic sign language recognition system is to assign meaning to each sign gesture. Recently, several computer vision systems have been proposed for sign language recognition using a variety of recognition techniques, sign languages, and gesture modalities. However, one of the challenging problems involves image preprocessing, segmentation, extraction and tracking of relevant static and dynamic features related to manual and nonmanual gestures from different images in sequence. In this paper, we studied the efficiency, scalability, and computation time of three cascaded architectures of convolutional neural network (CNN) and long short-term memory (LSTM) for the recognition of dynamic sign language gestures. The spatial features of dynamic signs are captured using CNN and fed into a multilayer stacked LSTM for temporal information learning. To track the motion in video frames, the absolute temporal differences between consecutive frames are computed and fed into the recognition system. Several experiments have been conducted on three benchmarking datasets of two sign languages to evaluate the proposed models. We also compared the proposed models with other techniques. The attained results show that our models capture better spatio-temporal features pertaining to the recognition of various sign language gestures and consistently outperform other techniques with over 99% accuracy.

引用

页码：2508 / 2525

页数：19

共 63 条

[1]

Adaloglou N, 2021, Arxiv, DOI [arXiv:2007.12530, DOI 10.1109/TMM.2021.3070438]

[2] Video-based signer-independent Arabic sign language recognition using hidden Markov models [J].

AL-Rousan, M. ;

Assaleh, K. ;

Tala'a, A. .

APPLIED SOFT COMPUTING, 2009, 9 (03) :990-999

[3] Feature extraction in Brazilian Sign Language Recognition based on phonological structure and using RGB-D sensors [J].

Almeida, Silvia Grasiella Moreira ;

Guimaraes, Frederico Gadelha ;

Ramirez, Jaime Arturo .

EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (16) :7259-7271

[4]

Aowal MA, 2014, TENCON IEEE REGION

[5] Trajectory-based recognition of dynamic Persian sign language using hidden Markov model [J].

Azar, Saeideh Ghanbari ;

Seyedarabi, Hadi .

COMPUTER SPEECH AND LANGUAGE, 2020, 61

[6]

Barros Pablo, 2014, Artificial Neural Networks and Machine Learning - ICANN 2014. 24th International Conference on Artificial Neural Networks. Proceedings: LNCS 8681, P403, DOI 10.1007/978-3-319-11179-7_51

[7]

Bheda V, 2017, Arxiv, DOI arXiv:1710.06836

[8]

BinMakhashen G, 2019, P 2 SMART CIT S SCS, P1

[9]

Celebi Sait, 2013, International Conference on Computer Vision Theory and Applications (VISAPP 2013). Proceedings, P620

[10] Hyperspectral Images Classification With Gabor Filtering and Convolutional Neural Network [J].

Chen, Yushi ;

Zhu, Lin ;

Ghamisi, Pedram ;

Jia, Xiuping ;

Li, Guoyu ;

Tang, Liang .

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (12) :2355-2359

← 1 2 3 4 5 6 7 →