Real-Time Lightweight Sign Language Recognition on Hybrid Deep CNN-BiLSTM Neural Network with Attention Mechanism

被引：0

作者：

Kazbekova, Gulnur ^{[1
]}

Ismagulova, Zhuldyz ^{[2
]}

Ibrayeva, Gulmira ^{[3
]}

Sundetova, Almagul ^{[4
]}

Abdrazakh, Yntymak ^{[1
]}

Baimurzayev, Boranbek ^{[1
]}

机构：

[1] Khoja Akhmet Yassawi Int Kazakh Turkish Univ, Turkistan, Kazakhstan

[2] ALT Univ, Alma Ata, Kazakhstan

[3] Mil Inst Air Def Forces, Aktobe, Kazakhstan

[4] Baishev Univ, Aktobe, Kazakhstan

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2025年 / 16卷 / 04期

关键词：

Sign language recognition; CNN-BiLSTM; attention mechanism; deep learning; gesture classification; real-time processing; assistive technology;

D O I：

10.14569/IJACSA.2025.0160452

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Sign language recognition (SLR) plays a crucial role in bridging communication gaps for individuals with hearing and speech impairments. This study proposes a hybrid deep CNN-BiLSTM neural network with an attention mechanism for realtime and lightweight sign language recognition. The CNN module extracts spatial features from individual gesture frames, while the BiLSTM module captures temporal dependencies, enhancing classification accuracy. The attention mechanism further refines feature selection by focusing on the most relevant time steps in a sign sequence. The proposed model was evaluated on the Sign Language MNIST dataset, achieving state-of-the-art performance with high accuracy, precision, recall, and F1-score. Experimental results indicate that the model converges rapidly, maintains low misclassification rates, and effectively distinguishes between visually similar signs. Confusion matrix analysis and feature map visualizations provide deeper insights into the hierarchical feature extraction process. The results demonstrate that integrating spatial, temporal, and attention-based learning significantly improves recognition performance while maintaining computational efficiency. Despite its effectiveness, challenges such as misclassification in ambiguous gestures and real-time computational constraints remain, suggesting future improvements in multi-modal fusion, transformer-based architectures, and lightweight model optimizations. The proposed approach offers a scalable and efficient solution for real-time sign language recognition, contributing to the development of assistive technologies for individuals with communication disabilities.

引用

页码：510 / 522

页数：13

共 38 条

[1] Light-Weight Deep Learning Techniques with Advanced Processing for Real-Time Hand Gesture Recognition [J].

Abdallah, Mohamed S. S. ;

Samaan, Gerges H. H. ;

Wadie, Abanoub R. R. ;

Makhmudov, Fazliddin ;

Cho, Young-Im .

SENSORS, 2023, 23 (01)

[2] Ethiopian sign language recognition using deep convolutional neural network [J].

Abeje, Bekalu Tadele ;

Salau, Ayodeji Olalekan ;

Mengistu, Abreham Debasu ;

Tamiru, Nigus Kefyalew .

MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (20) :29027-29043

[3]

Adithya V., 2022, Journal of Ambient Intelligence and Humanized Computing, V13, P45

[4]

Almeida D., 2021, Multimedia Tools and Applications, V80, P26149

[5]

Altayeva A. B., 2014, LIFE SCI J, V11, P227

[6]

Asadi H., 2022, Journal of Visual Communication and Image Representation, V83

[7]

Bian J., 2023, IEEE Transactions on Multimedia, V25, P123

[8] A Self-Supervised Learning-Based Intelligent Greenhouse Orchid Growth Inspection System for Precision Agriculture [J].

Chen, Liang-Bi ;

Huang, Guan-Zhi ;

Huang, Xiang-Rui ;

Wang, Wei-Chien .

IEEE SENSORS JOURNAL, 2022, 22 (24) :24567-24577

[9]

Chen X., 2021, Journal of Physics: Conference Series, V1748

[10]

Chen X., 2021, Pattern Recognition Letters, V145, P78

← 1 2 3 4 →