Attention-based hand semantic segmentation and gesture recognition using deep networks

被引：9

作者：

Sarma, Debajit ^{[1
]}

Dutta, H. Pallab Jyoti ^{[1
]}

Yadav, Kuldeep Singh ^{[2
]}

Bhuyan, M. K. ^{[1
]}

Laskar, Rabul Hussain ^{[2
]}

机构：

[1] IIT Guwahati, Dept EEE, Gauhati 781039, Assam, India

[2] NIT Silchar, Dept ECE, Silchar 788010, Assam, India

来源：

EVOLVING SYSTEMS | 2024年 / 15卷 / 01期

关键词：

Semantic segmentation; UNet; CBAM; VGG16; C3D; Static and dynamic hand gestures; Human-computer interaction;

D O I：

10.1007/s12530-023-09512-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The ability to discern the shape of hands can be a vital issue in improving the performance of hand gesture recognition for human-computer interaction. Segmentation itself is a very challenging problem having various constraints like illumination variations, complex background etc. The objective of the paper is to incorporate the perception of semantic segmentation into a classification problem and make use of the deep neural models to achieve improved results for both static and dynamic gestures. This paper utilizes the UNet architecture with attention-module to obtain the semantically segmented masks of the input images, which are then fed to a classifier for recognition. The concept of attention-mechanism adds to the improvement of segmentation accuracy. In this work, for static gestures, the top classifier layer of the VGG16 model is replaced with a classifier designed specifically for classifying the gestures at hand. For dynamic gestures, 3D-CNN (C3D) architecture is used as a classifier that can capture spatial as well as temporal information of a gesture video. The data augmentation process is used in preprocessing to generate a sufficient number of training images for the aforementioned CNN-based models. Significant and improved recognition has been achieved for both static and dynamic hand gesture databases through the inherent feature learning capability of CNN and refined segmentation.

引用

页码：185 / 201

页数：17

共 47 条

[1] Intelligent real-time Arabic sign language classification using attention-based inception and BiLSTM [J].

Abdul, Wadood ;

Alsulaiman, Mansour ;

Amin, Syed Umar ;

Faisal, Mohammed ;

Muhammad, Ghulam ;

Albogamy, Fahad R. ;

Bencherif, Mohamed A. ;

Ghaleb, Hamid .

COMPUTERS & ELECTRICAL ENGINEERING, 2021, 95

[2]

[Anonymous], 2016, ADV NEURAL INF PROCE

[3] Recognition of Static Gestures applied to Brazilian Sign Language (Libras) [J].

Bastos, Igor L. O. ;

Angelo, Michele F. ;

Loula, Angelo C. .

2015 28TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, 2015, :305-312

[4] PIS Hand: A Video Dataset and Benchmark for Real Time Continuous Hand Gesture Recognition [J].

Benitez-Garcia, Gibran ;

Olivares-Mercado, Jesus ;

Sanchez-Perez, Gabriel ;

Yanai, Keiji .

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, :4340-4347

[5] Review of constraints on vision-based gesture recognition for human-computer interaction [J].

Chakraborty, Biplab Ketan ;

Sarma, Debajit ;

Bhuyan, M. K. ;

MacDorman, Karl F. .

IET COMPUTER VISION, 2018, 12 (01) :3-15

[6]

Chen L.C., 2014, ARXIV, DOI DOI 10.48550/ARXIV.1412.7062

[7] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation [J].

Chen, Liang-Chieh ;

Zhu, Yukun ;

Papandreou, George ;

Schroff, Florian ;

Adam, Hartwig .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :833-851

[8] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [J].

Chen, Liang-Chieh ;

Papandreou, George ;

Kokkinos, Iasonas ;

Murphy, Kevin ;

Yuille, Alan L. .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) :834-848

[9]

Chen LB, 2017, IEEE INT SYMP NANO, P1, DOI 10.1109/NANOARCH.2017.8053709

[10] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J].

Chen, Long ;

Zhang, Hanwang ;

Xiao, Jun ;

Nie, Liqiang ;

Shao, Jian ;

Liu, Wei ;

Chua, Tat-Seng .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6298-6306

← 1 2 3 4 5 →