Attention-based hand semantic segmentation and gesture recognition using deep networks

被引:9
作者
Sarma, Debajit [1 ]
Dutta, H. Pallab Jyoti [1 ]
Yadav, Kuldeep Singh [2 ]
Bhuyan, M. K. [1 ]
Laskar, Rabul Hussain [2 ]
机构
[1] IIT Guwahati, Dept EEE, Gauhati 781039, Assam, India
[2] NIT Silchar, Dept ECE, Silchar 788010, Assam, India
关键词
Semantic segmentation; UNet; CBAM; VGG16; C3D; Static and dynamic hand gestures; Human-computer interaction;
D O I
10.1007/s12530-023-09512-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to discern the shape of hands can be a vital issue in improving the performance of hand gesture recognition for human-computer interaction. Segmentation itself is a very challenging problem having various constraints like illumination variations, complex background etc. The objective of the paper is to incorporate the perception of semantic segmentation into a classification problem and make use of the deep neural models to achieve improved results for both static and dynamic gestures. This paper utilizes the UNet architecture with attention-module to obtain the semantically segmented masks of the input images, which are then fed to a classifier for recognition. The concept of attention-mechanism adds to the improvement of segmentation accuracy. In this work, for static gestures, the top classifier layer of the VGG16 model is replaced with a classifier designed specifically for classifying the gestures at hand. For dynamic gestures, 3D-CNN (C3D) architecture is used as a classifier that can capture spatial as well as temporal information of a gesture video. The data augmentation process is used in preprocessing to generate a sufficient number of training images for the aforementioned CNN-based models. Significant and improved recognition has been achieved for both static and dynamic hand gesture databases through the inherent feature learning capability of CNN and refined segmentation.
引用
收藏
页码:185 / 201
页数:17
相关论文
共 47 条
[1]   Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM [J].
Abdul, Wadood ;
Alsulaiman, Mansour ;
Amin, Syed Umar ;
Faisal, Mohammed ;
Muhammad, Ghulam ;
Albogamy, Fahad R. ;
Bencherif, Mohamed A. ;
Ghaleb, Hamid .
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 95
[2]  
[Anonymous], 2016, ADV NEURAL INF PROCE
[3]   Recognition of Static Gestures applied to Brazilian Sign Language (Libras) [J].
Bastos, Igor L. O. ;
Angelo, Michele F. ;
Loula, Angelo C. .
2015 28TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, 2015, :305-312
[4]   PIS Hand: A Video Dataset and Benchmark for Real Time Continuous Hand Gesture Recognition [J].
Benitez-Garcia, Gibran ;
Olivares-Mercado, Jesus ;
Sanchez-Perez, Gabriel ;
Yanai, Keiji .
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :4340-4347
[5]   Review of constraints on vision-based gesture recognition for human-computer interaction [J].
Chakraborty, Biplab Ketan ;
Sarma, Debajit ;
Bhuyan, M. K. ;
MacDorman, Karl F. .
IET COMPUTER VISION, 2018, 12 (01) :3-15
[6]  
Chen L.C., 2014, ARXIV, DOI DOI 10.48550/ARXIV.1412.7062
[7]   Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].
Chen, Liang-Chieh ;
Zhu, Yukun ;
Papandreou, George ;
Schroff, Florian ;
Adam, Hartwig .
COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851
[8]   DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].
Chen, Liang-Chieh ;
Papandreou, George ;
Kokkinos, Iasonas ;
Murphy, Kevin ;
Yuille, Alan L. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848
[9]  
Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709
[10]   SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].
Chen, Long ;
Zhang, Hanwang ;
Xiao, Jun ;
Nie, Liqiang ;
Shao, Jian ;
Liu, Wei ;
Chua, Tat-Seng .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306