Deep Learning-Based Hand Gesture Recognition System and Design of a Human-Machine Interface

被引:3
作者
Sen, Abir [1 ]
Mishra, Tapas Kumar [1 ]
Dash, Ratnakar [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Rourkela 769008, India
关键词
Deep learning; Hand gesture recognition; Segmentation; Vision transformer; Kalman filter; Human machine interface; Transfer learning; Virtual mouse;
D O I
10.1007/s11063-023-11433-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hand gesture recognition plays an important role in developing effective human-machine interfaces (HMIs) that enable direct communication between humans and machines. But in real-time scenarios, it is difficult to identify the correct hand gesture to control an application while moving the hands. To address this issue, in this work, a low-cost hand gesture recognition system based human-computer interface (HCI) is presented in real-time scenarios. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) feature extraction and gesture classification using five pre-trained convolutional neural network models (CNN) and vision transformer (ViT), (4) building an interactive human-machine interface (HMI), (5) development of a gesture-controlled virtual mouse, (6) smoothing of virtual mouse pointer using of Kalman filter. In our work, five pre-trained CNN models (VGG16, VGG19, ResNet50, ResNet101, and Inception-V1) and ViT have been employed to classify hand gesture images. Two multi-class datasets (one public and one custom) have been used to validate the models. Considering the model's performances, it is observed that Inception-V1 has significantly shown a better classification performance compared to the other four CNN models and ViT in terms of accuracy, precision, recall, and F-score values. We have also expanded this system to control some multimedia applications (such as VLC player, audio player, playing 2D Super-Mario-Bros game, etc.) with different customized gesture commands in real-time scenarios. The average speed of this system has reached 25 fps (frames per second), which meets the requirements for the real-time scenario. Performance of the proposed gesture control system obtained the average response time in milisecond for each control which makes it suitable for real-time. This model (prototype) will benefit physically disabled people interacting with desktops.
引用
收藏
页码:12569 / 12596
页数:28
相关论文
共 35 条
[1]  
Abhishek KS, 2016, IEEE C ELEC DEVICES, P334, DOI 10.1109/EDSSC.2016.7785276
[2]  
Adthya V., 2020, Procedia Computer Science, V171, P2353, DOI 10.1016/j.procs.2020.04.255
[3]   A Structured and Methodological Review on Vision-Based Hand Gesture Recognition System [J].
Al Farid, Fahmid ;
Hashim, Noramiza ;
Abdullah, Junaidi ;
Bhuiyan, Md Roman ;
Isa, Wan Noor Shahida Mohd ;
Uddin, Jia ;
Haque, Mohammad Ahsanul ;
Husen, Mohd Nizam .
JOURNAL OF IMAGING, 2022, 8 (06)
[4]  
[Anonymous], 2010, ACM
[5]  
Asaari Mohd Shahrimie Mohd, 2010, Proceedings 10th International Conference on Intelligent Systems Design and Applications (ISDA 2010), P166, DOI 10.1109/ISDA.2010.5687273
[6]   Vision Transformers for Remote Sensing Image Classification [J].
Bazi, Yakoub ;
Bashmal, Laila ;
Rahhal, Mohamad M. Al ;
Dayil, Reham Al ;
Ajlan, Naif Al .
REMOTE SENSING, 2021, 13 (03) :1-20
[7]  
Berezhnoy V, 2018, P ICINCO, P458
[8]   Real-Time Hand Gesture Recognition Using Finger Segmentation [J].
Chen, Zhi-hua ;
Kim, Jung-Tae ;
Liang, Jianning ;
Zhang, Jing ;
Yuan, Yu-Bo .
SCIENTIFIC WORLD JOURNAL, 2014,
[9]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10]  
Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929