Prediction of sign language recognition based on multi layered CNN

被引:3
作者
Prasath, G. Arun [1 ]
Annapurani, K. [2 ]
机构
[1] SRM Inst Sci & Technol, Sch Comp, Dept Networking & Commun, Chengalpattu 603203, Tamilnadu, India
[2] SRM Inst Sci & Technol Kattankulathur, Sch Comp, Dept Networking & Commun, Chengalpattu 603203, Tamilnadu, India
关键词
Sign language recognition; Deep learning; Multi-layer convolutional neural network; Linear and non-linear features; Higher level and lower-level features; Precision; Accuracy; GESTURE RECOGNITION; SPEECH;
D O I
10.1007/s11042-023-14548-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sign Language Recognition (SLR) helps to bridge the gap between ordinary and hearing-impaired people. But various difficulties and challenges are faced by SLR system during real-time implementation. The major complexity associated with SLR is the inability to provide a consistent recognition process and it shows lesser recognition accuracy. To handle this issue, this research concentrates on adopting the finest classification approach to provide a feasible end-to-end system using deep learning approaches. This process transforms sign language into the voice for assisting the people to hear the sign language. The input is taken from the ROBITA Indian Sign Language Gesture Database and some essential pre-processing steps are done to avoid unnecessary artefacts. The proposed model is incorporated with the encoder Multi-Layer Convolutional Neural Networks (ML-CNN) for evaluating the scalability, accuracy of the end-to-end SLR. The encoder analyses the linear and non-linear features (higher level and lower level) to improve the quality of recognition. The simulation is carried out in a MATLAB environment where the performance of the ML-CNN model outperforms the existing approaches and establishes the trade-off. Some performance metrics like accuracy, precision, F-measure, recall, Matthews Correlation Coefficient (MCC), Mean Absolute Error (MAE) are evaluated to show the significance of the model. The prediction accuracy of the proposed ML-CNN with encoder is 87.5% in the ROBITA sign gesture dataset and it's increased by 1% and 3.5% over the BLSTM and HMM respectively.
引用
收藏
页码:29649 / 29669
页数:21
相关论文
共 35 条
[31]  
Wu J, 2015, 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN)
[32]   Semi-Supervised Image-to-Video Adaptation for Video Action Recognition [J].
Zhang, Jianguang ;
Han, Yahong ;
Tang, Jinhui ;
Hu, Qinghua ;
Jiang, Jianmin .
IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (04) :960-973
[33]   Speech emotion recognition using deep 1D & 2D CNN LSTM networks [J].
Zhao, Jianfeng ;
Mao, Xia ;
Chen, Lijiang .
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2019, 47 :312-323
[34]  
Zhao TM, 2018, IEEE INFOCOM SER, P1457, DOI 10.1109/INFOCOM.2018.8486006
[35]   DYNAMIC PSEUDO LABEL DECODING FOR CONTINUOUS SIGN LANGUAGE RECOGNITION [J].
Zhou, Hao ;
Zhou, Wengang ;
Li, Houqiang .
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, :1282-1287